system

A data processing system addresses labor shortages and service quality variations by using a server to build and update customer response models, ensuring consistent and efficient customer interactions.

JP2026105524APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-16
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

The hospitality industry faces challenges with labor shortages and variations in service quality, making it difficult to provide consistent high-quality customer service that does not depend on staff skills.

Method used

A system comprising a server for collecting data, constructing a customer response model, and updating it based on feedback, along with a terminal for interacting with customers using this model, enabling efficient and consistent customer service operations.

Benefits of technology

The system improves customer service quality and efficiency by leveraging machine learning to provide consistent responses and adapt to customer feedback, reducing labor costs and enhancing satisfaction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026105524000001_ABST
    Figure 2026105524000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] An information processing device that collects training data and constructs a customer response model based on said training data, A speech recognition device that interacts with customers using a customer interaction model provided by the information processing device, A data analysis device that collects customer feedback and updates the customer response model based on that feedback, A conversion device that uses speech recognition means to convert voice input from customers into text data, A response generation device that generates a response based on text data obtained by a speech recognition device using a generative AI model, A communication system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In the hospitality industry, labor shortages and variations in service quality are major issues. In particular, it is required to provide consistent high-quality service to customers who visit the store, and it is difficult to achieve customer service that does not depend on the skills of staff. Therefore, means for efficient utilization of human resources in customer service operations and for stabilizing quality are needed.

Means for Solving the Problems

[0005] To solve the above problems, the present invention provides a system that includes a server means for collecting training data and constructing a customer response model based on the training data, a terminal means for interacting with customers using the customer response model provided by the server means, and an analysis means for aggregating customer feedback and updating the customer response model based on the feedback. This enables consistent quality and efficiency improvements in customer service operations without relying on personnel.

[0006] "Training data" refers to information about past customer service history and interactions with customers, which is necessary to build a customer service model.

[0007] "Server means" refers to a computing device that builds a customer response model based on training data and provides that model to terminal means.

[0008] A "customer response model" refers to an algorithm or dataset built to appropriately respond to customer inquiries.

[0009] "Terminal means" refers to a device that interacts with customers using a customer interaction model provided by the server means.

[0010] "Feedback" refers to customer evaluations and opinions regarding the customer service provided.

[0011] "Analysis means" refers to system components that collect customer feedback and use it to update customer response models and adjust parameters. [Brief explanation of the drawing]

[0012] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3]It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which multiple emotions are mapped. [Figure 10] It shows an emotion map to which multiple emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Embodiments for Carrying Out the Invention

[0013] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0014] First, the terms used in the following description will be explained.

[0015] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0016] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0017] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs, various parameters, and the like. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0018] In the following embodiments, the numbered communication I / F (Interface) is an interface including a communication processor, an antenna, and the like. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0019] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0020] [First Embodiment]

[0021] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0022] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0023] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0024] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0025] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0026] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0027] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0028] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0029] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0030] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0031] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0032] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0033] The customer service humanoid system in this invention is based on a cooperative operation between a server and a terminal. The server first collects customer service data obtained from the store and uses machine learning to construct a customer response model from this data. This model analyzes customer behavior and generates appropriate responses.

[0034] The server sends this customer interaction model to a humanoid terminal, which then uses the model to interact with the customer. The terminal has speech recognition capabilities, which capture customer inquiries as voice data, convert it into text data through a subsequent natural language processing step, and generate a response through the customer interaction model. For example, if a customer asks about the availability of a product, the terminal will respond with something like, "The product in question is currently out of stock."

[0035] Feedback from customers is sent to the server via their devices. This feedback is analyzed on the server and used for subsequent model updates. Because the feedback contributes to improving customer service quality, it leads to improved overall system performance.

[0036] Thus, the present invention supports customer service operations through a series of cycles: model construction by the server, dialogue execution by the terminal, and analysis of user feedback, aiming to improve efficiency in store operations, reduce labor costs, and enhance customer satisfaction.

[0037] The following describes the processing flow.

[0038] Step 1:

[0039] The server collects customer service data from the store. This data includes voice conversation logs, chat history, customer questions, and staff responses.

[0040] Step 2:

[0041] The server preprocesses the collected data. Specifically, it removes noise from the raw data and converts audio data into text data. Through this process, it creates a well-structured dataset necessary for training machine learning models.

[0042] Step 3:

[0043] The server uses pre-processed data to build a customer service model through machine learning algorithms. This model learns to respond appropriately to a variety of customer question patterns.

[0044] Step 4:

[0045] The server sends the customer interaction model it has built to the terminal. This model is a crucial element that supports real-time interaction with customers on the terminal.

[0046] Step 5:

[0047] When the terminal detects a customer's approach through the humanoid's sensors, it enters conversation mode. The speech recognition system transcribes the customer's speech into text and generates the optimal response using a model provided by the server.

[0048] Step 6:

[0049] After receiving customer service, users provide feedback via their device. This feedback can cover a wide range of topics, including the quality of service, areas for improvement, and specific requests.

[0050] Step 7:

[0051] The server collects and analyzes user feedback. This feedback is used, as needed, to retrain the customer service model, improving its accuracy. This ensures continuous improvement of the entire system.

[0052] (Example 1)

[0053] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0054] Traditional customer service systems have resulted in inconsistent and inefficient service due to the reliance on individual staff members for quality. Furthermore, the inability to effectively utilize user feedback has made it difficult to improve service quality. There is a need to address these challenges to reduce labor costs and improve customer satisfaction in store operations.

[0055] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0056] In this invention, the server includes a processing device means for collecting data and constructing a generative AI model using machine learning techniques; a communication device means for converting speech data into text data using the generative AI model provided by the processing device means, performing natural language processing, and conducting dialogue with customers; and an analysis device means for accumulating user feedback, analyzing that information, and updating the generative AI model. This improves the quality of customer service, enables consistent service delivery, and allows for model improvements that reflect user feedback.

[0057] "Data" refers to a collection of information, specifically elements gathered for the purpose of improving system functionality and analysis.

[0058] "Machine learning techniques" are technologies that learn patterns from data and enable automatic decision-making and prediction.

[0059] A "generative AI model" is an algorithmic structure that is automatically generated based on training data and has the ability to perform tasks according to a specific purpose.

[0060] A "processing device" refers to a device that performs a series of functions for collecting, analyzing, and executing a specific task.

[0061] "Communication device means" refers to a device that transmits and receives data and exchanges information with other devices or systems.

[0062] "Audio data" refers to digital audio information in a format that can be analyzed and converted by a computer.

[0063] "Natural language processing" refers to the technology that enables computers to understand, analyze, and process human language.

[0064] A "user" is an individual or organization that uses a system or service.

[0065] "Feedback" refers to opinions or information provided to improve or modify actions or processes.

[0066] An "analysis device" refers to a device that has the function of analyzing data and outputting the results as information.

[0067] The system in this invention employs a configuration in which multiple devices operate in cooperation, and mainly consists of a server, a terminal, and a user.

[0068] server

[0069] The server collects data from within the store and from other sources. This data includes user behavior logs and conversation records. The server leverages this data and uses hardware and software to build generative AI models. Specifically, it uses Python programs and machine learning libraries. Once the model is built, it is transferred from the server to the terminal and used for future interactions.

[0070] terminal

[0071] The device accepts a generative AI model provided by the server. The device is equipped with speech recognition capabilities and captures the user's spoken audio data. This includes, for example, digital signal processing techniques to reduce noise from the audio. The audio is converted into text data via a speech recognition engine, followed by natural language processing. During this process, the device interprets the customer's questions and generates appropriate responses according to the generative AI model. The text data is then converted back into speech by a speech synthesis engine.

[0072] User

[0073] Users utilize services provided through a terminal. A concrete example is the ordering process at a cafe. When a user says, "I'd like a latte," the terminal interprets the voice and responds, "One latte, understood." Users can provide feedback on the service, and the terminal sends this feedback to a server.

[0074] This feedback is analyzed on the server and used to improve the model.

[0075] As an example of a prompt, it could be input into the AI ​​model in the form of, "Generate a response based on a customer's order. For example, how would you respond if the customer said, 'I'd like one latte, please?'"

[0076] With the above configuration, the present invention achieves efficient, consistent, and high-quality customer service, contributing to improved overall system performance.

[0077] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0078] Step 1:

[0079] The server collects data from inside and outside the store using various sensors and networks. This data, including conversation logs and customer behavior patterns, is stored in the server's database. Audio and text data flow into the server as input, and a clean dataset necessary for analysis and learning is constructed as output. Specifically, preprocessing such as noise reduction and data cleaning is performed.

[0080] Step 2:

[0081] The server applies machine learning algorithms to the collected data to build a generative AI model. The clean dataset obtained in the previous step is used as input, and the output is a generative AI model for customer interaction. This model is trained using Python programs and libraries and implemented using frameworks such as Tensorflow®.

[0082] Step 3:

[0083] The server sends the completed generative AI model to the terminal. The model is packaged in JSON or other data formats and transferred over the network. The input is a pre-built customer interaction model, and the output is the terminal receiving that model and importing it into its own system.

[0084] Step 4:

[0085] The device utilizes speech recognition technology to receive voice input from the user. The customer's voice is input to the device, and the output is the conversion of the voice into text. Specifically, a speech recognition engine is used, utilizing services such as Google® Speech-to-Text API.

[0086] Step 5:

[0087] The device analyzes the converted text data using natural language processing and generates an appropriate response using a generative AI model. The input is the converted text data. The output is the optimal response to be returned to the user. For example, if the user asks, "Do you have product A?", the device checks the inventory status and provides a response such as, "Product A is currently out of stock."

[0088] Step 6:

[0089] The device uses speech synthesis technology to convert text responses into speech and respond to the user. The generated text response is provided as input, and the response to the user is provided in speech format as output. Specifically, the speech synthesis engine operates, producing natural-sounding speech output.

[0090] Step 7:

[0091] Users input feedback on the provided service into the terminal. The input consists of the user's rating and comments, and the output is the recording of that feedback as digital data. Feedback content is entered through user interaction, using touch panels or voice input.

[0092] Step 8:

[0093] The terminal transfers user feedback data to the server. The input is feedback data recorded on the terminal, and the output is data sent to the server for analysis. Data communication takes place via a network interface.

[0094] Step 9:

[0095] The server analyzes the feedback it receives and uses it to improve and update the generated AI model. The server processes the feedback as input, and the output includes adjustments to the model's parameters or the addition of new behaviors. This results in higher quality customer service in the future.

[0096] (Application Example 1)

[0097] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0098] Automating customer service and responding quickly and accurately to customer inquiries within stores is necessary to improve customer satisfaction. Furthermore, using speech recognition and natural language processing is required to provide efficient customer service without increasing human resources.

[0099] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0100] In this invention, the server includes an information processing device that collects training data and constructs a customer response model based on the training data; a speech recognition device that interacts with customers using the customer response model provided by the information processing device; a data analysis device that aggregates customer feedback and updates the customer response model based on the feedback; a conversion device that converts voice input from customers into text data using speech recognition means; and a response generation device that generates a response based on the text data obtained by the speech recognition device using a generation AI model. This enables the provision of appropriate responses in real time to a variety of customer questions, improving the efficiency of customer service operations in stores and enhancing customer satisfaction.

[0101] "Training data" refers to data collected about customer behavior and responses, which is used to build customer service models.

[0102] A "customer response model" is a set of algorithms and rules that a server constructs to generate appropriate responses to customer questions and requests.

[0103] An "information processing device" is a computer device used to build customer response models using collected data.

[0104] A "voice recognition device" is a device that analyzes a customer's voice input and converts it into text data.

[0105] A "data analysis device" is a device that collects customer feedback and updates customer service models to improve customer satisfaction.

[0106] A "conversion device" is a device used to convert speech data obtained through speech recognition into text data.

[0107] A "generative AI model" is an artificial intelligence model designed to automatically generate appropriate responses based on input data.

[0108] A "response generation device" is a device that uses a generative AI model to create responses to customer questions.

[0109] A "communication system" is a system that combines these devices and functions to efficiently exchange information.

[0110] The system for realizing this invention includes an information processing device, a speech recognition device, a data analysis device, a conversion device, and a response generation device using a generative AI model. These devices work together to provide real-time customer service.

[0111] First, the server collects customer behavior data using various sensors and data collection mechanisms, and stores it as training data in an information processing device. Based on this data, a customer response model is built. The server then transmits the customer response model to a speech recognition device. This device receives voice input from the customer and sends it to a conversion device that converts it into text data using speech recognition means.

[0112] The character data generated by the conversion device is passed to a response generation device equipped with a generative AI model. This response generation device uses the generative AI model to generate an appropriate response based on the given data. This process utilizes Google Cloud Speech-to-Text and GPT-4®, achieving advanced natural language processing.

[0113] The user receives the response and provides feedback as needed. The data analysis device aggregates this feedback and sends it back to the server. The server analyzes the feedback data for future model updates and adjusts the customer response model.

[0114] As a concrete example, in a real store, if a user asks a question by voice, such as "Where are the new products?", a voice recognition device converts the voice into text, and a response generation device generates a response such as "The new products are on the left side of the store." This allows the user to find the products smoothly.

[0115] An example of a prompt for a generative AI model is: "The user is asking about the current location of an item. The question is 'Where is the new product?' Please generate an answer considering the store map."

[0116] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0117] Step 1:

[0118] The server uses an information processing device to collect customer behavior data from various sensors and data collection mechanisms. This data is stored as training data, and a customer response model is built. The input is customer behavior data, and the output is the customer response model. This model analyzes the characteristics of the collected data and learns patterns to generate appropriate responses.

[0119] Step 2:

[0120] The server provides the constructed customer interaction model to the speech recognition device. The speech recognition device receives it and prepares itself. The input is the customer interaction model, and the output is the ready-to-use speech recognition device. The speech recognition device has already learned how to accurately process speech data using the received model.

[0121] Step 3:

[0122] Users ask questions in the store using voice. The speech recognition system quickly captures the voice input and converts it into text data using speech recognition technology. The input is voice data, and the output is text data. Google Cloud Speech-to-Text is used for this data conversion, achieving high accuracy in transcription.

[0123] Step 4:

[0124] The terminal's conversion device transmits the character data obtained from the speech recognition device to a response generation device equipped with a generation AI model. Because the generation AI model uses GPT-4, it generates an appropriate response based on the prompt text. The input consists of character data and the prompt text, and the output is the generated response text. Specifically, the generation AI model performs natural language processing on the obtained character data and selects the most appropriate response.

[0125] Step 5:

[0126] The user receives a response generated by a response generator. Based on this response, which is obtained in voice or text, the user decides on their actions within the store. The input is the generated response text, and the output is the user's actions. This response serves to guide the user so that they can quickly find the products they need.

[0127] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0128] This invention employs a configuration that combines a humanoid system for customer service with an emotion engine. The server unit constructs a customer interaction model using existing training data and provides it to the terminal unit. This terminal unit is installed in the store as a humanoid device and interacts with customers using speech recognition and natural language processing.

[0129] The emotion engine analyzes various sensor information, such as voice tone, facial expressions, and body movements, to recognize the user's emotions. The device uses this emotion data to provide responses that reflect the user's emotional state. For example, if the device determines that the user is angry, it will carefully choose its words and respond in a calming manner.

[0130] Users can provide feedback after receiving service, and this feedback is sent to the server. The server analyzes this feedback and uses it to improve the accuracy of the customer service model. Furthermore, sentiment analysis results obtained from the sentiment engine are also used as feedback, contributing to optimal interactions with users.

[0131] This will enable more consistent and high-quality customer service, aiming to further improve customer satisfaction. The present invention is embodied as a system to support efficient personnel utilization and appropriate communication with customers in store operations.

[0132] The following describes the processing flow.

[0133] Step 1:

[0134] The server aggregates customer service data sent from each terminal within the store. This includes information such as voice logs, conversation content, and customer reactions.

[0135] Step 2:

[0136] The server preprocesses the data and generates a dataset organized for analysis. This involves converting audio data to text, and performing noise reduction and data normalization.

[0137] Step 3:

[0138] The server trains a machine learning model using pre-processed data. This model learns appropriate response patterns in response to customer requests and accumulates the knowledge necessary for customer service.

[0139] Step 4:

[0140] The server sends a pre-trained customer service model to the terminal. This model forms the foundation for real-time customer service.

[0141] Step 5:

[0142] When the device detects a customer, it initiates speech recognition and natural language processing to interpret the customer's question. For example, if a question about a product comes in, it retrieves relevant answer data from the model.

[0143] Step 6:

[0144] The device uses an emotion engine to analyze the customer's emotions. It analyzes voice tone, facial expressions, and gestures in real time to identify the user's emotional state.

[0145] Step 7:

[0146] The device adjusts the tone and content of its responses according to the customer's emotions to provide appropriate communication. For example, if a user is confused, it will provide detailed explanations to reassure them.

[0147] Step 8:

[0148] After receiving customer service, users enter feedback into a terminal. This feedback is used to improve the accuracy of the system.

[0149] Step 9:

[0150] The server aggregates feedback and sentiment analysis results sent from the terminals and uses them to improve the customer service model. Based on these results, the model parameters are adjusted to improve the accuracy of responses in the future.

[0151] (Example 2)

[0152] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0153] Modern customer service systems face challenges such as limited interaction with customers and difficulty in responding in a way that considers the individual customer's emotions. In particular, there is a need to appropriately recognize the customer's emotional state and respond accordingly. Furthermore, traditional systems lacked mechanisms to effectively utilize customer feedback to optimize the model, making it difficult to consistently improve service quality.

[0154] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0155] In this invention, the server includes information processing means for collecting training data and constructing a customer response model based on said training data; an emotion analysis device for analyzing voice tone, facial expressions, and body movements to generate emotion data; and an analysis device for aggregating customer feedback and updating the customer response model based on said feedback and emotion data. This enables sophisticated dialogue that takes customer emotions into consideration and consistent improvement in the quality of service.

[0156] "Training data" refers to a set of data collected as foundational information for building a customer service model.

[0157] "Information processing means" refers to a device or system for generating and providing customer response models using training data.

[0158] A "dialogue device" refers to equipment installed for the purpose of interacting with customers, and is a device that generates real-time responses in voice or text using a dialogue model.

[0159] An "emotion analysis device" refers to a device that analyzes a user's emotions based on information such as voice tone, facial expressions, and body movements, and generates emotional data.

[0160] An "analysis device" refers to a device or system used to update and optimize customer service models based on customer feedback and emotional data.

[0161] A "customer response model" is a model designed to generate appropriate responses in customer interactions, and is built from training data.

[0162] "Feedback" refers to information such as evaluations and impressions that customers provide after experiencing a service.

[0163] This invention aims to enhance customer interaction and improve the customer experience in customer service systems. It primarily focuses on three parties: the server, the terminal, and the user, and is implemented as follows.

[0164] The server aggregates training data and uses information processing tools to build a customer response model. This training data includes past dialogue records and user feedback data. Based on this, the server utilizes generative AI models, commonly used in natural language processing, to form a model that acts as a response generation module. The constructed model is then provided to the terminal and used for on-the-spot customer interactions.

[0165] The terminal is a humanoid device installed in stores that directly interacts with customers. It uses well-known voice recognition software, for example, converting and analyzing customer speech via a commonly used natural language processing API. Furthermore, it incorporates an emotion analyzer, using a camera and microphone to analyze voice tone, facial expressions, and body movements to determine the customer's emotions. This allows the dialogue device to generate responses tailored to the customer's emotions, improving the accuracy of the conversation.

[0166] Users provide feedback after interacting with the system. This feedback information is sent to the server via the terminal. The server analyzes this feedback information and data obtained from sentiment analysis to continuously improve the model. This enables the system to provide more appropriate responses that better meet customer needs and emotions in subsequent interactions.

[0167] For example, if a user asks "What are the recommended items?" in a store, the terminal recognizes the question and responds using a stored model, "Today's special is a smoothie. Would you like to try one?" At the same time, it performs emotion analysis based on the user's facial expressions and voice, and can also provide additional input depending on the situation. Furthermore, by utilizing a generative AI model and inputting prompts such as "Show me an example of a response when the user is satisfied," it can explore a wider variety of flexible response patterns.

[0168] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0169] Step 1:

[0170] The server collects training data.

[0171] Input: Past dialogue logs and user feedback data

[0172] Specific operation: The server accesses the storage system and selectively retrieves interaction history and feedback.

[0173] Output: The set of training data required to build the model.

[0174] Step 2:

[0175] The server builds a customer response model based on the collected training data.

[0176] Input: Set of training data

[0177] Specific operation: The server uses natural language processing algorithms to analyze data and train a generative AI model.

[0178] Output: Newly constructed customer service model

[0179] Step 3:

[0180] The server provides the established customer support model to the terminal.

[0181] Input: Customer support model

[0182] Specific operation: The server transfers the model to the terminal via a secure communication protocol.

[0183] Output: Latest customer support model installed on the device

[0184] Step 4:

[0185] The terminal receives voice input from the customer.

[0186] Input: Customer voice

[0187] Specific operation: The device's microphone captures the customer's voice, and speech recognition software converts it to text.

[0188] Output: Customer questions or requests in text format

[0189] Step 5:

[0190] The terminal analyzes customer input and generates a response.

[0191] Input: Customer question or request in text format

[0192] Specific operation: The terminal uses a customer interaction model to analyze text and create an appropriate response. Simultaneously, an emotion analyzer analyzes voice tone and facial expressions, and incorporates the results into the response.

[0193] Output: Response message to present to the customer

[0194] Step 6:

[0195] Users provide feedback after the interaction.

[0196] Input: Feedback information (evaluation, suggestions for improvement, etc.)

[0197] Specific action: The user inputs feedback using the device's interface.

[0198] Output: Feedback data sent to the server

[0199] Step 7:

[0200] The server receives feedback and analyzes it to improve the model.

[0201] Input: Feedback data and sentiment analysis results

[0202] Specific operation: The server analyzes the new feedback data and retrains the customer interaction model by appropriately adjusting the model's parameters.

[0203] Output: Improved customer service model

[0204] (Application Example 2)

[0205] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0206] Traditional customer service systems had the challenge of not being able to accurately recognize customer emotions and provide appropriate responses immediately. This made it difficult to improve customer satisfaction and limited the efficient use of personnel. Furthermore, the models were not sufficiently improved using feedback, creating a need for measures to improve service quality.

[0207] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0208] In this invention, the server includes an information processing device that collects learning data and constructs a dialogue model based on the learning data; a device that performs dialogue with a human using the dialogue model provided by the information processing device; a sensor analysis device that interprets the user's emotional state and adjusts the response based on the interpretation result; and an information analysis device that aggregates evaluations from the user and updates the dialogue model based on the evaluation. This makes it possible to understand the user's emotions in real time, provide appropriate customer service, and improve customer satisfaction.

[0209] An "information processing device" is a device that collects training data and builds a dialogue model based on that data.

[0210] A "device" is a device that interacts with humans using a dialogue model provided by an information processing device.

[0211] A "sensor analysis device" is a device that interprets the user's emotional state and adjusts its response based on that interpretation.

[0212] An "information analysis device" is a device that collects user feedback and updates the dialogue model based on that feedback.

[0213] A "dialogue model" is a model used to understand conversations with users and generate appropriate responses.

[0214] To implement this invention, an information processing device, a device device, a sensor analysis device, and a system using the information analysis device are required. First, the server collects diverse learning data through the information processing device and constructs a dialogue model based on it. This model is a generative AI model that enables natural conversation with humans.

[0215] Next, terminal devices are deployed in stores and service locations and interact with customers using dialogue models acquired from information processing devices. They interpret customer questions using a speech recognition system (e.g., Google Cloud Speech-to-Text) and a natural language processing engine (e.g., OpenAI® GPT-3®). Meanwhile, sensor analysis devices utilize an emotion analysis engine (e.g., Microsoft® Azure® Emotion Recognition) to analyze the customer's emotional state in real time from their voice and facial expressions and reflect this in their responses.

[0216] Furthermore, user feedback is aggregated by an information analysis device. This device adjusts the parameters of the dialogue model based on the collected feedback to improve the model's performance. For example, in a scenario where a customer is unsure which smartphone to buy, the server uses an emotion engine to analyze emotional data indicating anxiety and generates a reassuring response to the device, such as, "There are many options available, let's find the best one together."

[0217] As an example of a prompt, the AI ​​generation model would be input in the format of, "A customer is hesitant about purchasing a smartphone. Please generate a reassuring customer service message." In this way, a system is created that can accurately reflect the user's emotional state, contributing to improved customer satisfaction.

[0218] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0219] Step 1:

[0220] The server uses an information processing device to collect training data and builds a dialogue model based on it. It receives large amounts of text data and past customer interaction records as input, and uses a generative AI model to output a dialogue model suitable for natural language processing tasks. Specifically, it performs data cleansing, converts the data into a format suitable for the model, and then trains the AI ​​model.

[0221] Step 2:

[0222] The terminal device receives a dialogue model provided by the server and utilizes it during customer service. It receives voice data from the customer as input and converts it to text using Google Cloud Speech-to-Text. As output, this text data is passed to a natural language processing engine such as OpenAI GPT-3 to generate appropriate response text. Specifically, the generated text is played back by a speech synthesizer to respond to the user.

[0223] Step 3:

[0224] A sensor analysis device analyzes customer emotions in real time. It collects customer voice tone, facial expressions, and gestures from cameras and microphones as input, and outputs emotion data using Microsoft Azure Emotion Recognition. Specifically, it extracts features from audio and video data and estimates emotions based on them. This output data is returned to the device and used to adjust responses.

[0225] Step 4:

[0226] After interacting with a customer, the user enters feedback into a terminal. The input information includes text and evaluation scores, which are collected by an information analysis device. The feedback is then sent to a server and used to adjust the model's parameters. Specifically, the feedback data is accumulated, analyzed periodically, and the feedback is incorporated into the dialogue model.

[0227] Step 5:

[0228] The server uses feedback data obtained from the information analysis device to adjust and improve the parameters of the dialogue model. It analyzes the user evaluations collected as input to identify factors affecting the model's performance. As output, an improved new dialogue model is generated and provided back to the terminal. Specifically, the model training pipeline is automatically adjusted based on the analysis results, evolving into a more accurate model.

[0229] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0230] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include those described above. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions shown by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0231] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0232] [Second Embodiment]

[0233] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0234] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0235] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0236] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0237] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0238] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0239] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0240] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0241] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0242] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0243] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0244] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0245] The customer service humanoid system in this invention is based on a cooperative operation between a server and a terminal. The server first collects customer service data obtained from the store and uses machine learning to construct a customer response model from this data. This model analyzes customer behavior and generates appropriate responses.

[0246] The server sends this customer interaction model to a humanoid terminal, which then uses the model to interact with the customer. The terminal has speech recognition capabilities, which capture customer inquiries as voice data, convert it into text data through a subsequent natural language processing step, and generate a response through the customer interaction model. For example, if a customer asks about the availability of a product, the terminal will respond with something like, "The product in question is currently out of stock."

[0247] Feedback from customers is sent to the server via their devices. This feedback is analyzed on the server and used for subsequent model updates. Because the feedback contributes to improving customer service quality, it leads to improved overall system performance.

[0248] Thus, the present invention supports customer service operations through a series of cycles: model construction by the server, dialogue execution by the terminal, and analysis of user feedback, aiming to improve efficiency in store operations, reduce labor costs, and enhance customer satisfaction.

[0249] The following describes the processing flow.

[0250] Step 1:

[0251] The server collects customer service data from the store. This data includes voice conversation logs, chat history, customer questions, and staff responses.

[0252] Step 2:

[0253] The server preprocesses the collected data. Specifically, it removes noise from the raw data and converts audio data into text data. Through this process, it creates a well-structured dataset necessary for training machine learning models.

[0254] Step 3:

[0255] The server uses pre-processed data to build a customer interaction model through machine learning algorithms. This model learns to respond appropriately to a variety of customer question patterns.

[0256] Step 4:

[0257] The server sends the customer interaction model it has built to the terminal. This model is a crucial element that supports real-time interaction with the customer on the terminal.

[0258] Step 5:

[0259] When the terminal detects a customer's approach through the humanoid's sensors, it enters conversation mode. The speech recognition system transcribes the customer's speech into text and generates the optimal response using a model provided by the server.

[0260] Step 6:

[0261] After receiving customer service, users provide feedback via their device. This feedback can cover a wide range of topics, including the quality of service, areas for improvement, and specific requests.

[0262] Step 7:

[0263] The server collects and analyzes user feedback. This feedback is used, as needed, to retrain the customer service model, improving its accuracy. This ensures continuous improvement of the entire system.

[0264] (Example 1)

[0265] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0266] Traditional customer service systems have resulted in inconsistent and inefficient service due to the reliance on individual staff members for quality. Furthermore, the inability to effectively utilize user feedback has made it difficult to improve service quality. There is a need to address these challenges to reduce labor costs and improve customer satisfaction in store operations.

[0267] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0268] In this invention, the server includes a processing device means for collecting data and constructing a generative AI model using machine learning techniques; a communication device means for converting speech data into text data using the generative AI model provided by the processing device means, performing natural language processing, and conducting dialogue with customers; and an analysis device means for accumulating user feedback, analyzing that information, and updating the generative AI model. This improves the quality of customer service, enables consistent service delivery, and allows for model improvements that reflect user feedback.

[0269] "Data" refers to a collection of information, specifically elements gathered for the purpose of improving system functionality and analysis.

[0270] "Machine learning techniques" are technologies that learn patterns from data and enable automatic decision-making and prediction.

[0271] A "generative AI model" is an algorithmic structure that is automatically generated based on training data and has the ability to perform tasks according to a specific purpose.

[0272] A "processing device" refers to a device that performs a series of functions for collecting, analyzing, and executing a specific task.

[0273] "Communication device means" refers to a device that transmits and receives data and exchanges information with other devices or systems.

[0274] "Audio data" refers to digital audio information in a format that can be analyzed and converted by a computer.

[0275] "Natural language processing" refers to the technology that enables computers to understand, analyze, and process human language.

[0276] A "user" is an individual or organization that uses a system or service.

[0277] "Feedback" refers to opinions or information provided to improve or modify actions or processes.

[0278] An "analysis device" refers to a device that has the function of analyzing data and outputting the results as information.

[0279] The system in this invention employs a configuration in which multiple devices operate in cooperation, and mainly consists of a server, a terminal, and a user.

[0280] server

[0281] The server collects data from within the store and from other sources. This data includes user behavior logs and conversation records. The server leverages this data and uses hardware and software to build generative AI models. Specifically, it uses Python programs and machine learning libraries. Once the model is built, it is transferred from the server to the terminal and used for future interactions.

[0282] terminal

[0283] The terminal receives the generative AI model provided by the server. The terminal has a voice recognition function to capture the voice data spoken by the user. For example, it includes digital signal processing technology for reducing noise from the voice. The voice is converted into character data through a voice recognition engine, and then natural language processing is performed. In this process, the terminal interprets the customer's question and generates an appropriate response according to the generative AI model. The character data is converted back into voice by a voice synthesis engine.

[0284] User

[0285] The user utilizes the service provided through the terminal. As a specific example, it includes the ordering process in a café. When the user says "Please give me a latte", the terminal interprets the voice and responds with "One latte, understood". The user can provide feedback on the service, and the terminal sends this feedback to the server.

[0286] This feedback is analyzed by the server and utilized to improve the model.

[0287] As an example of the prompt text, it can be considered to input into the generative AI model in the form of "Please generate a response according to the customer's order. Example: When the customer says 'Please give me one latte', how do you respond?".

[0288] With the above configuration, the present invention realizes efficient, consistent, and high-quality customer service, contributing to the performance improvement of the entire system.

[0289] The flow of the specific process in Example 1 will be described using FIG. 11.

[0290] Step 1:

[0291] The server collects data from inside and outside the store using various sensors and networks. This data, including conversation logs and customer behavior patterns, is stored in the server's database. Audio and text data flow into the server as input, and a clean dataset necessary for analysis and learning is constructed as output. Specifically, preprocessing such as noise reduction and data cleaning is performed.

[0292] Step 2:

[0293] The server applies machine learning algorithms to the collected data to build a generative AI model. The clean dataset obtained in the previous step is used as input, and the output is a generative AI model for customer interaction. This model is trained using Python programs and libraries and implemented using frameworks such as TensorFlow.

[0294] Step 3:

[0295] The server sends the completed generative AI model to the terminal. The model is packaged in JSON or other data formats and transferred over the network. The input is a pre-built customer interaction model, and the output is the terminal receiving that model and importing it into its own system.

[0296] Step 4:

[0297] The device utilizes speech recognition technology to receive voice input from the user. The customer's voice is input to the device, and the output is the conversion of the voice into text. Specifically, a speech recognition engine is used, utilizing APIs such as Google Speech-to-Text.

[0298] Step 5:

[0299] The device analyzes the converted text data using natural language processing and generates an appropriate response using a generative AI model. The input is the converted text data. The output is the optimal response to be returned to the user. For example, if the user asks, "Do you have product A?", the device checks the inventory status and provides a response such as, "Product A is currently out of stock."

[0300] Step 6:

[0301] The device uses speech synthesis technology to convert text responses into speech and respond to the user. The generated text response is provided as input, and the response to the user is provided in speech format as output. Specifically, the speech synthesis engine operates, producing natural-sounding speech output.

[0302] Step 7:

[0303] Users input feedback on the provided service into the terminal. The input consists of the user's rating and comments, and the output is the recording of that feedback as digital data. Feedback content is entered through user interaction, using touch panels or voice input.

[0304] Step 8:

[0305] The terminal transfers user feedback data to the server. The input is feedback data recorded on the terminal, and the output is data sent to the server for analysis. Data communication takes place via a network interface.

[0306] Step 9:

[0307] The server analyzes the received feedback and utilizes it to improve and update the generative AI model. As input, the digital data of the feedback is processed by the server, and as output, the parameters of the model are adjusted or new operations are added. This enables higher-quality customer service in subsequent interactions.

[0308] (Application Example 1)

[0309] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0310] It is necessary to improve customer satisfaction by automating customer service and providing quick and accurate responses to customers' questions in the store. Also, by using speech recognition and natural language processing, it is required to provide efficient customer service without increasing human resources.

[0311] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0312] In this invention, the server includes an information processing device that collects learning data and constructs a customer service model based on the learning data, a speech recognition device that uses the customer service model provided by the information processing device to interact with customers, a data analysis device that aggregates feedback from customers and updates the customer service model based on the feedback, a conversion device that converts voice input from customers into character data using speech recognition means, and a response generation device that generates a response based on the character data obtained by the speech recognition device using a generative AI model. This enables providing appropriate responses in real time to various questions from customers, improving the efficiency of customer service in the store and enhancing customer satisfaction.

[0313] "Learning data" is data that collects information on customers' behaviors and reactions and is used to construct a customer service model.

[0314] A "customer response model" is a set of algorithms and rules that a server constructs to generate appropriate responses to customer questions and requests.

[0315] An "information processing device" is a computer device used to build customer response models using collected data.

[0316] A "voice recognition device" is a device that analyzes a customer's voice input and converts it into text data.

[0317] A "data analysis device" is a device that collects customer feedback and updates customer service models to improve customer satisfaction.

[0318] A "conversion device" is a device used to convert speech data obtained through speech recognition into text data.

[0319] A "generative AI model" is an artificial intelligence model designed to automatically generate appropriate responses based on input data.

[0320] A "response generation device" is a device that uses a generative AI model to create responses to customer questions.

[0321] A "communication system" is a system that combines these devices and functions to efficiently exchange information.

[0322] The system for realizing this invention includes an information processing device, a speech recognition device, a data analysis device, a conversion device, and a response generation device using a generative AI model. These devices work together to provide real-time customer service.

[0323] First, the server collects customer behavior data using various sensors and data collection mechanisms, and stores it as training data in an information processing device. Based on this data, a customer response model is built. The server then transmits the customer response model to a speech recognition device. This device receives voice input from the customer and sends it to a conversion device that converts it into text data using speech recognition means.

[0324] The character data generated by the conversion device is passed to a response generation device equipped with a generative AI model. This response generation device uses the generative AI model to generate an appropriate response based on the given data. This process utilizes Google Cloud Speech-to-Text and GPT-4, achieving advanced natural language processing.

[0325] The user receives the response and provides feedback as needed. The data analysis device aggregates this feedback and sends it back to the server. The server analyzes the feedback data for future model updates and adjusts the customer response model.

[0326] As a concrete example, in a real store, if a user asks a question by voice, such as "Where are the new products?", a voice recognition device converts the voice into text, and a response generation device generates a response such as "The new products are on the left side of the store." This allows the user to find the products smoothly.

[0327] An example of a prompt for a generative AI model is: "The user is asking about the current location of an item. The question is 'Where is the new product?' Please generate an answer considering the store map."

[0328] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0329] Step 1:

[0330] The server uses an information processing device to collect customer behavior data from various sensors and data collection mechanisms. This data is stored as training data, and a customer response model is built. The input is customer behavior data, and the output is the customer response model. This model analyzes the characteristics of the collected data and learns patterns to generate appropriate responses.

[0331] Step 2:

[0332] The server provides the constructed customer interaction model to the speech recognition device. The speech recognition device receives it and prepares itself. The input is the customer interaction model, and the output is the ready-to-use speech recognition device. The speech recognition device has already learned how to accurately process speech data using the received model.

[0333] Step 3:

[0334] Users ask questions in the store using voice. The speech recognition system quickly captures the voice input and converts it into text data using speech recognition technology. The input is voice data, and the output is text data. Google Cloud Speech-to-Text is used for this data conversion, achieving high accuracy in transcription.

[0335] Step 4:

[0336] The terminal's conversion device transmits the character data obtained from the speech recognition device to a response generation device equipped with a generation AI model. Because the generation AI model uses GPT-4, it generates an appropriate response based on the prompt text. The input consists of character data and the prompt text, and the output is the generated response text. Specifically, the generation AI model performs natural language processing on the obtained character data and selects the most appropriate response.

[0337] Step 5:

[0338] The user receives a response generated by a response generator. Based on this response, which is obtained in voice or text, the user decides on their actions within the store. The input is the generated response text, and the output is the user's actions. This response serves to guide the user so that they can quickly find the products they need.

[0339] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0340] This invention employs a configuration that combines a humanoid system for customer service with an emotion engine. The server unit constructs a customer interaction model using existing training data and provides it to the terminal unit. This terminal unit is installed in the store as a humanoid device and interacts with customers using speech recognition and natural language processing.

[0341] The emotion engine analyzes various sensor information, such as voice tone, facial expressions, and body movements, to recognize the user's emotions. The device uses this emotion data to provide responses that reflect the user's emotional state. For example, if the device determines that the user is angry, it will carefully choose its words and respond in a calming manner.

[0342] Users can provide feedback after receiving service, and this feedback is sent to the server. The server analyzes this feedback and uses it to improve the accuracy of the customer service model. Furthermore, sentiment analysis results obtained from the sentiment engine are also used as feedback, contributing to optimal interactions with users.

[0343] This will enable more consistent and high-quality customer service, aiming to further improve customer satisfaction. The present invention is embodied as a system to support efficient personnel utilization and appropriate communication with customers in store operations.

[0344] The following describes the processing flow.

[0345] Step 1:

[0346] The server aggregates customer service data sent from each terminal within the store. This data includes information such as voice logs, conversation content, and customer reactions.

[0347] Step 2:

[0348] The server preprocesses the data and generates a dataset organized for analysis. This involves converting audio data to text, and performing noise reduction and data normalization.

[0349] Step 3:

[0350] The server trains a machine learning model using pre-processed data. This model learns appropriate response patterns in response to customer requests and accumulates the knowledge necessary for customer service.

[0351] Step 4:

[0352] The server sends a pre-trained customer interaction model to the terminal. This model forms the foundation for real-time customer support.

[0353] Step 5:

[0354] When the device detects a customer, it initiates speech recognition and natural language processing to interpret the customer's question. For example, if a question about a product comes in, it retrieves relevant answer data from the model.

[0355] Step 6:

[0356] The device uses an emotion engine to analyze the customer's emotions. It analyzes voice tone, facial expressions, and gestures in real time to identify the user's emotional state.

[0357] Step 7:

[0358] The device adjusts the tone and content of its responses according to the customer's emotions to provide appropriate communication. For example, if a user is confused, it will provide detailed explanations to reassure them.

[0359] Step 8:

[0360] After receiving customer service, users enter feedback into a terminal. This feedback is used to improve the accuracy of the system.

[0361] Step 9:

[0362] The server aggregates feedback and sentiment analysis results sent from the terminals and uses them to improve the customer service model. Based on these results, the model parameters are adjusted to improve the accuracy of responses in subsequent interactions.

[0363] (Example 2)

[0364] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0365] Modern customer service systems face challenges such as limited interaction with customers and difficulty in responding in a way that considers the individual customer's emotions. In particular, there is a need to appropriately recognize the customer's emotional state and respond accordingly. Furthermore, traditional systems lacked mechanisms to effectively utilize customer feedback to optimize the model, making it difficult to consistently improve service quality.

[0366] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0367] In this invention, the server includes information processing means for collecting training data and constructing a customer response model based on said training data; an emotion analysis device for analyzing voice tone, facial expressions, and body movements to generate emotion data; and an analysis device for aggregating customer feedback and updating the customer response model based on said feedback and emotion data. This enables sophisticated dialogue that takes customer emotions into consideration and consistent improvement in the quality of service.

[0368] "Training data" refers to a set of data collected as foundational information for building a customer service model.

[0369] "Information processing means" refers to a device or system for generating and providing customer response models using training data.

[0370] A "dialogue device" refers to equipment installed for interacting with customers, which uses a dialogue model to generate real-time responses in voice or text.

[0371] An "emotion analysis device" refers to a device that analyzes a user's emotions based on information such as voice tone, facial expressions, and body movements, and generates emotional data.

[0372] An "analysis device" refers to a device or system used to update and optimize customer service models based on customer feedback and emotional data.

[0373] A "customer response model" is a model designed to generate appropriate responses in customer interactions, and is built from training data.

[0374] "Feedback" refers to information such as evaluations and impressions that customers provide after experiencing a service.

[0375] This invention aims to enhance customer interaction in customer service systems and improve the customer experience. It primarily focuses on three parties: the server, the terminal, and the user, and is implemented as follows.

[0376] The server aggregates training data and uses information processing tools to build a customer response model. This training data includes past dialogue records and user feedback data. Based on this, the server utilizes generative AI models, commonly used in natural language processing, to form a model that acts as a response generation module. The constructed model is then provided to the terminal and used for on-the-spot customer interactions.

[0377] The terminal is a humanoid device installed in stores that directly interacts with customers. It uses well-known voice recognition software, for example, converting and analyzing customer speech via a commonly used natural language processing API. Furthermore, it incorporates an emotion analyzer, using a camera and microphone to analyze voice tone, facial expressions, and body movements to determine the customer's emotions. This allows the dialogue device to generate responses tailored to the customer's emotions, improving the accuracy of the conversation.

[0378] Users provide feedback after interacting with the system. This feedback information is sent to the server via the terminal. The server analyzes this feedback information and data obtained from sentiment analysis to continuously improve the model. This enables the system to provide more appropriate responses that better meet customer needs and emotions in subsequent interactions.

[0379] For example, if a user asks "What are the recommended items?" in a store, the terminal recognizes the question and responds using a stored model, "Today's special is a smoothie. Would you like to try one?" At the same time, it performs emotion analysis based on the user's facial expressions and voice, and can also provide additional input depending on the situation. Furthermore, by utilizing a generative AI model and inputting prompts such as "Show me an example of a response when the user is satisfied," it can explore a wider variety of flexible response patterns.

[0380] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0381] Step 1:

[0382] The server collects training data.

[0383] Input: Past dialogue logs and user feedback data

[0384] Specific operation: The server accesses the storage system and selectively retrieves interaction history and feedback.

[0385] Output: The set of training data required to build the model.

[0386] Step 2:

[0387] The server builds a customer response model based on the collected training data.

[0388] Input: Set of training data

[0389] Specific operation: The server uses natural language processing algorithms to analyze data and train a generative AI model.

[0390] Output: Newly constructed customer service model

[0391] Step 3:

[0392] The server provides the established customer support model to the terminal.

[0393] Input: Customer support model

[0394] Specific operation: The server transfers the model to the terminal via a secure communication protocol.

[0395] Output: Latest customer support model installed on the device

[0396] Step 4:

[0397] The terminal receives voice input from the customer.

[0398] Input: Customer voice

[0399] Specific operation: The device's microphone captures the customer's voice, and speech recognition software converts it to text.

[0400] Output: Customer questions or requests in text format

[0401] Step 5:

[0402] The terminal analyzes customer input and generates a response.

[0403] Input: Customer question or request in text format

[0404] Specific operation: The terminal uses a customer interaction model to analyze text and create an appropriate response. Simultaneously, an emotion analyzer analyzes voice tone and facial expressions, and incorporates the results into the response.

[0405] Output: Response message to present to the customer

[0406] Step 6:

[0407] Users provide feedback after the interaction.

[0408] Input: Feedback information (evaluation, suggestions for improvement, etc.)

[0409] Specific operation: The user inputs feedback using the device's interface.

[0410] Output: Feedback data sent to the server

[0411] Step 7:

[0412] The server receives feedback and analyzes it to improve the model.

[0413] Input: Feedback data and sentiment analysis results

[0414] Specific operation: The server analyzes the new feedback data and retrains the customer interaction model by appropriately adjusting the model's parameters.

[0415] Output: Improved customer service model

[0416] (Application Example 2)

[0417] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0418] Traditional customer service systems had the challenge of not being able to accurately recognize customer emotions and provide appropriate responses immediately. This made it difficult to improve customer satisfaction and limited the efficient use of personnel. Furthermore, the models were not sufficiently improved using feedback, creating a need for measures to improve service quality.

[0419] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0420] In this invention, the server includes an information processing device that collects learning data and constructs a dialogue model based on the learning data; a device that performs dialogue with a human using the dialogue model provided by the information processing device; a sensor analysis device that interprets the user's emotional state and adjusts the response based on the interpretation result; and an information analysis device that aggregates evaluations from the user and updates the dialogue model based on the evaluation. This makes it possible to understand the user's emotions in real time, provide appropriate customer service, and improve customer satisfaction.

[0421] An "information processing device" is a device that collects training data and builds a dialogue model based on that data.

[0422] A "device" is a device that interacts with humans using a dialogue model provided by an information processing device.

[0423] A "sensor analysis device" is a device that interprets the user's emotional state and adjusts its response based on that interpretation.

[0424] An "information analysis device" is a device that collects user feedback and updates the dialogue model based on that feedback.

[0425] A "dialogue model" is a model used to understand conversations with users and generate appropriate responses.

[0426] To implement this invention, an information processing device, a device device, a sensor analysis device, and a system using the information analysis device are required. First, the server collects diverse learning data through the information processing device and constructs a dialogue model based on it. This model is a generative AI model that enables natural conversation with humans.

[0427] Next, terminal devices are deployed in stores and service locations and interact with customers using dialogue models acquired from information processing devices. They interpret customer questions using speech recognition systems (e.g., Google Cloud Speech-to-Text) and natural language processing engines (e.g., OpenAI GPT-3). Meanwhile, sensor analysis devices utilize emotion analysis engines (e.g., Microsoft Azure Emotion Recognition) to analyze the customer's emotional state in real time from their voice and facial expressions and reflect this in their responses.

[0428] Furthermore, user feedback is aggregated by an information analysis device. This device adjusts the parameters of the dialogue model based on the collected feedback to improve the model's performance. For example, in a scenario where a customer is unsure which smartphone to buy, the server uses an emotion engine to analyze emotional data indicating anxiety and generates a reassuring response to the device, such as, "There are many options available, let's find the best one together."

[0429] As an example of a prompt, the AI ​​generation model would be input in the format of, "A customer is hesitant about purchasing a smartphone. Please generate a reassuring customer service message." In this way, a system is created that can accurately reflect the user's emotional state, contributing to improved customer satisfaction.

[0430] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0431] Step 1:

[0432] The server uses an information processing device to collect training data and builds a dialogue model based on it. It receives large amounts of text data and past customer interaction records as input, and uses a generative AI model to output a dialogue model suitable for natural language processing tasks. Specifically, it performs data cleansing, converts the data into a format suitable for the model, and then trains the AI ​​model.

[0433] Step 2:

[0434] The terminal device receives a dialogue model provided by the server and utilizes it during customer service. It receives voice data from the customer as input and converts it to text using Google Cloud Speech-to-Text. As output, this text data is passed to a natural language processing engine such as OpenAI GPT-3 to generate appropriate response text. Specifically, the generated text is played back by a speech synthesizer to respond to the user.

[0435] Step 3:

[0436] A sensor analysis device analyzes customer emotions in real time. It collects customer voice tone, facial expressions, and gestures from cameras and microphones as input, and outputs emotion data using Microsoft Azure Emotion Recognition. Specifically, it extracts features from audio and video data and estimates emotions based on them. This output data is returned to the device and used to adjust responses.

[0437] Step 4:

[0438] After interacting with a customer, the user enters feedback into a terminal. The input information includes text and evaluation scores, which are collected by an information analysis device. The feedback is then sent to a server and used to adjust the model's parameters. Specifically, the feedback data is accumulated, analyzed periodically, and the feedback is incorporated into the dialogue model.

[0439] Step 5:

[0440] The server uses feedback data obtained from the information analysis device to adjust and improve the parameters of the dialogue model. It analyzes the user evaluations collected as input to identify factors affecting the model's performance. As output, an improved new dialogue model is generated and provided back to the terminal. Specifically, the model training pipeline is automatically adjusted based on the analysis results, evolving into a more accurate model.

[0441] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0442] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include those described above. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions shown by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0443] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0444] [Third Embodiment]

[0445] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0446] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0447] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0448] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0449] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0450] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0451] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0452] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0453] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0454] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0455] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0456] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0457] The customer service humanoid system in this invention is based on a cooperative operation between a server and a terminal. The server first collects customer service data obtained from the store and uses machine learning to construct a customer response model from this data. This model analyzes customer behavior and generates appropriate responses.

[0458] The server sends this customer interaction model to a humanoid terminal, which then uses the model to interact with the customer. The terminal has speech recognition capabilities, which capture customer inquiries as voice data, convert it into text data through a subsequent natural language processing step, and generate a response through the customer interaction model. For example, if a customer asks about the availability of a product, the terminal will respond with something like, "The product in question is currently out of stock."

[0459] Feedback from customers is sent to the server via their devices. This feedback is analyzed on the server and used for subsequent model updates. Because the feedback contributes to improving customer service quality, it leads to improved overall system performance.

[0460] Thus, the present invention supports customer service operations through a series of cycles: model construction by the server, dialogue execution by the terminal, and analysis of user feedback, aiming to improve efficiency in store operations, reduce labor costs, and enhance customer satisfaction.

[0461] The following describes the processing flow.

[0462] Step 1:

[0463] The server collects customer service data from the store. This data includes voice conversation logs, chat history, customer questions, and staff responses.

[0464] Step 2:

[0465] The server preprocesses the collected data. Specifically, it removes noise from the raw data and converts audio data into text data. Through this process, it creates a well-structured dataset necessary for training machine learning models.

[0466] Step 3:

[0467] The server uses pre-processed data to build a customer interaction model through machine learning algorithms. This model learns to respond appropriately to a variety of customer question patterns.

[0468] Step 4:

[0469] The server sends the customer interaction model it has built to the terminal. This model is a crucial element that supports real-time interaction with the customer on the terminal.

[0470] Step 5:

[0471] When the terminal detects a customer's approach through the humanoid's sensors, it enters conversation mode. The speech recognition system transcribes the customer's speech into text and generates the optimal response using a model provided by the server.

[0472] Step 6:

[0473] After receiving customer service, users provide feedback via their device. This feedback can cover a wide range of topics, including the quality of service, areas for improvement, and specific requests.

[0474] Step 7:

[0475] The server collects and analyzes user feedback. This feedback is used, as needed, to retrain the customer service model, improving its accuracy. This ensures continuous improvement of the entire system.

[0476] (Example 1)

[0477] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0478] Traditional customer service systems have resulted in inconsistent and inefficient service due to the reliance on individual staff members for quality. Furthermore, the inability to effectively utilize user feedback has made it difficult to improve service quality. There is a need to address these challenges to reduce labor costs and improve customer satisfaction in store operations.

[0479] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0480] In this invention, the server includes a processing device means for collecting data and constructing a generative AI model using machine learning techniques; a communication device means for converting speech data into text data using the generative AI model provided by the processing device means, performing natural language processing, and conducting dialogue with customers; and an analysis device means for accumulating user feedback, analyzing that information, and updating the generative AI model. This improves the quality of customer service, enables consistent service delivery, and allows for model improvements that reflect user feedback.

[0481] "Data" refers to a collection of information, specifically elements gathered for the purpose of improving system functionality and analysis.

[0482] "Machine learning techniques" are technologies that learn patterns from data and enable automatic decision-making and prediction.

[0483] A "generative AI model" is an algorithmic structure that is automatically generated based on training data and has the ability to perform tasks according to a specific purpose.

[0484] A "processing device" refers to a device that performs a series of functions for collecting, analyzing, and executing a specific task.

[0485] "Communication device means" refers to a device that transmits and receives data and exchanges information with other devices or systems.

[0486] "Audio data" refers to digital audio information in a format that can be analyzed and converted by a computer.

[0487] "Natural language processing" refers to the technology that enables computers to understand, analyze, and process human language.

[0488] A "user" is an individual or organization that uses a system or service.

[0489] "Feedback" refers to opinions or information provided to improve or modify actions or processes.

[0490] An "analysis device" refers to a device that has the function of analyzing data and outputting the results as information.

[0491] The system in this invention employs a configuration in which multiple devices operate in cooperation, and mainly consists of a server, a terminal, and a user.

[0492] server

[0493] The server collects data from within the store and from other sources. This data includes user behavior logs and conversation records. The server leverages this data and uses hardware and software to build generative AI models. Specifically, it uses Python programs and machine learning libraries. Once the model is built, it is transferred from the server to the terminal and used for future interactions.

[0494] terminal

[0495] The device accepts a generative AI model provided by the server. The device is equipped with speech recognition capabilities and captures the user's spoken audio data. This includes, for example, digital signal processing techniques to reduce noise from the audio. The audio is converted into text data via a speech recognition engine, followed by natural language processing. During this process, the device interprets the customer's questions and generates appropriate responses according to the generative AI model. The text data is then converted back into speech by a speech synthesis engine.

[0496] User

[0497] Users utilize services provided through a terminal. A concrete example is the ordering process at a cafe. When a user says, "I'd like a latte," the terminal interprets the voice and responds, "One latte, understood." Users can provide feedback on the service, and the terminal sends this feedback to a server.

[0498] This feedback is analyzed on the server and used to improve the model.

[0499] As an example of a prompt, it could be input into the AI ​​model in the form of, "Generate a response based on a customer's order. For example, how would you respond if the customer said, 'I'd like one latte, please?'"

[0500] With the above configuration, the present invention achieves efficient, consistent, and high-quality customer service, contributing to improved overall system performance.

[0501] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0502] Step 1:

[0503] The server collects data from inside and outside the store using various sensors and networks. This data, including conversation logs and customer behavior patterns, is stored in the server's database. Audio and text data flow into the server as input, and a clean dataset necessary for analysis and learning is constructed as output. Specifically, preprocessing such as noise reduction and data cleaning is performed.

[0504] Step 2:

[0505] The server applies machine learning algorithms to the collected data to build a generative AI model. The clean dataset obtained in the previous step is used as input, and the output is a generative AI model for customer interaction. This model is trained using Python programs and libraries and implemented using frameworks such as TensorFlow.

[0506] Step 3:

[0507] The server sends the completed generative AI model to the terminal. The model is packaged in JSON or other data formats and transferred over the network. The input is a pre-built customer interaction model, and the output is the terminal receiving that model and importing it into its own system.

[0508] Step 4:

[0509] The device utilizes speech recognition technology to receive voice input from the user. The customer's voice is input to the device, and the output is the conversion of the voice into text. Specifically, a speech recognition engine is used, utilizing APIs such as Google Speech-to-Text.

[0510] Step 5:

[0511] The device analyzes the converted text data using natural language processing and generates an appropriate response using a generative AI model. The input is the converted text data. The output is the optimal response to be returned to the user. For example, if the user asks, "Do you have product A?", the device checks the inventory status and provides a response such as, "Product A is currently out of stock."

[0512] Step 6:

[0513] The device uses speech synthesis technology to convert text responses into speech and respond to the user. The generated text response is provided as input, and the response to the user is provided in speech format as output. Specifically, the speech synthesis engine operates, producing natural-sounding speech output.

[0514] Step 7:

[0515] Users input feedback on the provided service into the terminal. The input consists of the user's rating and comments, and the output is the recording of that feedback as digital data. Feedback content is entered through user interaction, using touch panels or voice input.

[0516] Step 8:

[0517] The terminal transfers user feedback data to the server. The input is feedback data recorded on the terminal, and the output is data sent to the server for analysis. Data communication takes place via a network interface.

[0518] Step 9:

[0519] The server analyzes the feedback it receives and uses it to improve and update the generated AI model. The server processes the feedback as input, and the output includes adjustments to the model's parameters or the addition of new behaviors. This results in higher quality customer service in the future.

[0520] (Application Example 1)

[0521] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0522] Automating customer service and responding quickly and accurately to customer inquiries within stores is necessary to improve customer satisfaction. Furthermore, using speech recognition and natural language processing is required to provide efficient customer service without increasing human resources.

[0523] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0524] In this invention, the server includes an information processing device that collects training data and constructs a customer response model based on the training data; a speech recognition device that interacts with customers using the customer response model provided by the information processing device; a data analysis device that aggregates customer feedback and updates the customer response model based on the feedback; a conversion device that converts voice input from customers into text data using speech recognition means; and a response generation device that generates a response based on the text data obtained by the speech recognition device using a generation AI model. This enables the provision of appropriate responses in real time to a variety of customer questions, improving the efficiency of customer service operations in stores and enhancing customer satisfaction.

[0525] "Training data" refers to data collected about customer behavior and responses, which is used to build customer service models.

[0526] A "customer response model" is a set of algorithms and rules that a server constructs to generate appropriate responses to customer questions and requests.

[0527] An "information processing device" is a computer device used to build customer response models using collected data.

[0528] A "voice recognition device" is a device that analyzes a customer's voice input and converts it into text data.

[0529] A "data analysis device" is a device that collects customer feedback and updates customer service models to improve customer satisfaction.

[0530] A "conversion device" is a device used to convert speech data obtained through speech recognition into text data.

[0531] A "generative AI model" is an artificial intelligence model designed to automatically generate appropriate responses based on input data.

[0532] A "response generation device" is a device that uses a generative AI model to create responses to customer questions.

[0533] A "communication system" is a system that combines these devices and functions to efficiently exchange information.

[0534] The system for realizing this invention includes an information processing device, a speech recognition device, a data analysis device, a conversion device, and a response generation device using a generative AI model. These devices work together to provide real-time customer service.

[0535] First, the server collects customer behavior data using various sensors and data collection mechanisms, and stores it as training data in an information processing device. Based on this data, a customer response model is built. The server then transmits the customer response model to a speech recognition device. This device receives voice input from the customer and sends it to a conversion device that converts it into text data using speech recognition means.

[0536] The character data generated by the conversion device is passed to a response generation device equipped with a generative AI model. This response generation device uses the generative AI model to generate an appropriate response based on the given data. This process utilizes Google Cloud Speech-to-Text and GPT-4, achieving advanced natural language processing.

[0537] The user receives the response and provides feedback as needed. The data analysis device aggregates this feedback and sends it back to the server. The server analyzes the feedback data for future model updates and adjusts the customer response model.

[0538] As a concrete example, in a real store, if a user asks a question by voice, such as "Where are the new products?", a voice recognition device converts the voice into text, and a response generation device generates a response such as "The new products are on the left side of the store." This allows the user to find the products smoothly.

[0539] An example of a prompt for a generative AI model is: "The user is asking about the current location of an item. The question is 'Where is the new product?' Please generate an answer considering the store map."

[0540] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0541] Step 1:

[0542] The server uses an information processing device to collect customer behavior data from various sensors and data collection mechanisms. This data is stored as training data, and a customer response model is built. The input is customer behavior data, and the output is the customer response model. This model analyzes the characteristics of the collected data and learns patterns to generate appropriate responses.

[0543] Step 2:

[0544] The server provides the constructed customer interaction model to the speech recognition device. The speech recognition device receives it and prepares itself. The input is the customer interaction model, and the output is the ready-to-use speech recognition device. The speech recognition device has already learned how to accurately process speech data using the received model.

[0545] Step 3:

[0546] Users ask questions in the store using voice. The speech recognition system quickly captures the voice input and converts it into text data using speech recognition technology. The input is voice data, and the output is text data. Google Cloud Speech-to-Text is used for this data conversion, achieving high accuracy in transcription.

[0547] Step 4:

[0548] The terminal's conversion device transmits the character data obtained from the speech recognition device to a response generation device equipped with a generation AI model. Because the generation AI model uses GPT-4, it generates an appropriate response based on the prompt text. The input consists of character data and the prompt text, and the output is the generated response text. Specifically, the generation AI model performs natural language processing on the obtained character data and selects the most appropriate response.

[0549] Step 5:

[0550] The user receives a response generated by a response generator. Based on this response, which is obtained in voice or text, the user decides on their actions within the store. The input is the generated response text, and the output is the user's actions. This response serves to guide the user so that they can quickly find the products they need.

[0551] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0552] This invention employs a configuration that combines a humanoid system for customer service with an emotion engine. The server unit constructs a customer interaction model using existing training data and provides it to the terminal unit. This terminal unit is installed in the store as a humanoid device and interacts with customers using speech recognition and natural language processing.

[0553] The emotion engine analyzes various sensor information, such as voice tone, facial expressions, and body movements, to recognize the user's emotions. The device uses this emotion data to provide responses that reflect the user's emotional state. For example, if the device determines that the user is angry, it will carefully choose its words and respond in a calming manner.

[0554] Users can provide feedback after receiving service, and this feedback is sent to the server. The server analyzes this feedback and uses it to improve the accuracy of the customer service model. Furthermore, sentiment analysis results obtained from the sentiment engine are also used as feedback, contributing to optimal interactions with users.

[0555] This will enable more consistent and high-quality customer service, aiming to further improve customer satisfaction. The present invention is embodied as a system to support efficient personnel utilization and appropriate communication with customers in store operations.

[0556] The following describes the processing flow.

[0557] Step 1:

[0558] The server aggregates customer service data sent from each terminal within the store. This data includes information such as voice logs, conversation content, and customer reactions.

[0559] Step 2:

[0560] The server preprocesses the data and generates a dataset organized for analysis. This involves converting audio data to text, and performing noise reduction and data normalization.

[0561] Step 3:

[0562] The server trains a machine learning model using pre-processed data. This model learns appropriate response patterns in response to customer requests and accumulates the knowledge necessary for customer service.

[0563] Step 4:

[0564] The server sends a pre-trained customer interaction model to the terminal. This model forms the foundation for real-time customer support.

[0565] Step 5:

[0566] When the device detects a customer, it initiates speech recognition and natural language processing to interpret the customer's question. For example, if a question about a product comes in, it retrieves relevant answer data from the model.

[0567] Step 6:

[0568] The device uses an emotion engine to analyze the customer's emotions. It analyzes voice tone, facial expressions, and gestures in real time to identify the user's emotional state.

[0569] Step 7:

[0570] The device adjusts the tone and content of its responses according to the customer's emotions to provide appropriate communication. For example, if a user is confused, it will provide detailed explanations to reassure them.

[0571] Step 8:

[0572] After receiving customer service, users enter feedback into a terminal. This feedback is used to improve the accuracy of the system.

[0573] Step 9:

[0574] The server aggregates feedback and sentiment analysis results sent from the terminals and uses them to improve the customer service model. Based on these results, the model parameters are adjusted to improve the accuracy of responses in subsequent interactions.

[0575] (Example 2)

[0576] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0577] Modern customer service systems face challenges such as limited interaction with customers and difficulty in responding in a way that considers the individual customer's emotions. In particular, there is a need to appropriately recognize the customer's emotional state and respond accordingly. Furthermore, traditional systems lacked mechanisms to effectively utilize customer feedback to optimize the model, making it difficult to consistently improve service quality.

[0578] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0579] In this invention, the server includes information processing means for collecting training data and constructing a customer response model based on said training data; an emotion analysis device for analyzing voice tone, facial expressions, and body movements to generate emotion data; and an analysis device for aggregating customer feedback and updating the customer response model based on said feedback and emotion data. This enables sophisticated dialogue that takes customer emotions into consideration and consistent improvement in the quality of service.

[0580] "Training data" refers to a set of data collected as foundational information for building a customer service model.

[0581] "Information processing means" refers to a device or system for generating and providing customer response models using training data.

[0582] A "dialogue device" refers to equipment installed for interacting with customers, which uses a dialogue model to generate real-time responses in voice or text.

[0583] An "emotion analysis device" refers to a device that analyzes a user's emotions based on information such as voice tone, facial expressions, and body movements, and generates emotional data.

[0584] An "analysis device" refers to a device or system used to update and optimize customer service models based on customer feedback and emotional data.

[0585] A "customer response model" is a model designed to generate appropriate responses in customer interactions, and is built from training data.

[0586] "Feedback" refers to information such as evaluations and impressions that customers provide after experiencing a service.

[0587] This invention aims to enhance customer interaction in customer service systems and improve the customer experience. It primarily focuses on three parties: the server, the terminal, and the user, and is implemented as follows.

[0588] The server aggregates training data and uses information processing tools to build a customer response model. This training data includes past dialogue records and user feedback data. Based on this, the server utilizes generative AI models, commonly used in natural language processing, to form a model that acts as a response generation module. The constructed model is then provided to the terminal and used for on-the-spot customer interactions.

[0589] The terminal is a humanoid device installed in stores that directly interacts with customers. It uses well-known voice recognition software, for example, converting and analyzing customer speech via a commonly used natural language processing API. Furthermore, it incorporates an emotion analyzer, using a camera and microphone to analyze voice tone, facial expressions, and body movements to determine the customer's emotions. This allows the dialogue device to generate responses tailored to the customer's emotions, improving the accuracy of the conversation.

[0590] Users provide feedback after interacting with the system. This feedback information is sent to the server via the terminal. The server analyzes this feedback information and data obtained from sentiment analysis to continuously improve the model. This enables the system to provide more appropriate responses that better meet customer needs and emotions in subsequent interactions.

[0591] For example, if a user asks "What are the recommended items?" in a store, the terminal recognizes the question and responds using a stored model, "Today's special is a smoothie. Would you like to try one?" At the same time, it performs emotion analysis based on the user's facial expressions and voice, and can also provide additional input depending on the situation. Furthermore, by utilizing a generative AI model and inputting prompts such as "Show me an example of a response when the user is satisfied," it can explore a wider variety of flexible response patterns.

[0592] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0593] Step 1:

[0594] The server collects training data.

[0595] Input: Past dialogue logs and user feedback data

[0596] Specific operation: The server accesses the storage system and selectively retrieves interaction history and feedback.

[0597] Output: The set of training data required to build the model.

[0598] Step 2:

[0599] The server builds a customer response model based on the collected training data.

[0600] Input: Set of training data

[0601] Specific operation: The server uses natural language processing algorithms to analyze data and train a generative AI model.

[0602] Output: Newly constructed customer service model

[0603] Step 3:

[0604] The server provides the established customer support model to the terminal.

[0605] Input: Customer support model

[0606] Specific operation: The server transfers the model to the terminal via a secure communication protocol.

[0607] Output: Latest customer support model installed on the device

[0608] Step 4:

[0609] The terminal receives voice input from the customer.

[0610] Input: Customer voice

[0611] Specific operation: The device's microphone captures the customer's voice, and speech recognition software converts it to text.

[0612] Output: Customer questions or requests in text format

[0613] Step 5:

[0614] The terminal analyzes customer input and generates a response.

[0615] Input: Customer question or request in text format

[0616] Specific operation: The terminal uses a customer interaction model to analyze text and create an appropriate response. Simultaneously, an emotion analyzer analyzes voice tone and facial expressions, and incorporates the results into the response.

[0617] Output: Response message to present to the customer

[0618] Step 6:

[0619] Users provide feedback after the interaction.

[0620] Input: Feedback information (evaluation, suggestions for improvement, etc.)

[0621] Specific operation: The user inputs feedback using the device's interface.

[0622] Output: Feedback data sent to the server

[0623] Step 7:

[0624] The server receives feedback and analyzes it to improve the model.

[0625] Input: Feedback data and sentiment analysis results

[0626] Specific operation: The server analyzes the new feedback data and retrains the customer interaction model by appropriately adjusting the model's parameters.

[0627] Output: Improved customer service model

[0628] (Application Example 2)

[0629] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0630] Traditional customer service systems had the challenge of not being able to accurately recognize customer emotions and provide appropriate responses immediately. This made it difficult to improve customer satisfaction and limited the efficient use of personnel. Furthermore, the models were not sufficiently improved using feedback, creating a need for measures to improve service quality.

[0631] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0632] In this invention, the server includes an information processing device that collects learning data and constructs a dialogue model based on the learning data; a device that performs dialogue with a human using the dialogue model provided by the information processing device; a sensor analysis device that interprets the user's emotional state and adjusts the response based on the interpretation result; and an information analysis device that aggregates evaluations from the user and updates the dialogue model based on the evaluation. This makes it possible to understand the user's emotions in real time, provide appropriate customer service, and improve customer satisfaction.

[0633] An "information processing device" is a device that collects training data and builds a dialogue model based on that data.

[0634] A "device" is a device that interacts with humans using a dialogue model provided by an information processing device.

[0635] A "sensor analysis device" is a device that interprets the user's emotional state and adjusts its response based on that interpretation.

[0636] An "information analysis device" is a device that collects user feedback and updates the dialogue model based on that feedback.

[0637] A "dialogue model" is a model used to understand conversations with users and generate appropriate responses.

[0638] To implement this invention, an information processing device, a device device, a sensor analysis device, and a system using the information analysis device are required. First, the server collects diverse learning data through the information processing device and constructs a dialogue model based on it. This model is a generative AI model that enables natural conversation with humans.

[0639] Next, terminal devices are deployed in stores and service locations and interact with customers using dialogue models acquired from information processing devices. They interpret customer questions using speech recognition systems (e.g., Google Cloud Speech-to-Text) and natural language processing engines (e.g., OpenAI GPT-3). Meanwhile, sensor analysis devices utilize emotion analysis engines (e.g., Microsoft Azure Emotion Recognition) to analyze the customer's emotional state in real time from their voice and facial expressions and reflect this in their responses.

[0640] Furthermore, user feedback is aggregated by an information analysis device. This device adjusts the parameters of the dialogue model based on the collected feedback to improve the model's performance. For example, in a scenario where a customer is unsure which smartphone to buy, the server uses an emotion engine to analyze emotional data indicating anxiety and generates a reassuring response to the device, such as, "There are many options available, let's find the best one together."

[0641] As an example of a prompt, the AI ​​generation model would be input in the format of, "A customer is hesitant about purchasing a smartphone. Please generate a reassuring customer service message." In this way, a system is created that can accurately reflect the user's emotional state, contributing to improved customer satisfaction.

[0642] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0643] Step 1:

[0644] The server uses an information processing device to collect training data and builds a dialogue model based on it. It receives large amounts of text data and past customer interaction records as input, and uses a generative AI model to output a dialogue model suitable for natural language processing tasks. Specifically, it performs data cleansing, converts the data into a format suitable for the model, and then trains the AI ​​model.

[0645] Step 2:

[0646] The terminal device receives a dialogue model provided by the server and utilizes it during customer service. It receives voice data from the customer as input and converts it to text using Google Cloud Speech-to-Text. As output, this text data is passed to a natural language processing engine such as OpenAI GPT-3 to generate appropriate response text. Specifically, the generated text is played back by a speech synthesizer to respond to the user.

[0647] Step 3:

[0648] A sensor analysis device analyzes customer emotions in real time. It collects customer voice tone, facial expressions, and gestures from cameras and microphones as input, and outputs emotion data using Microsoft Azure Emotion Recognition. Specifically, it extracts features from audio and video data and estimates emotions based on them. This output data is returned to the device and used to adjust responses.

[0649] Step 4:

[0650] After interacting with a customer, the user enters feedback into a terminal. The input information includes text and evaluation scores, which are collected by an information analysis device. The feedback is then sent to a server and used to adjust the model's parameters. Specifically, the feedback data is accumulated, analyzed periodically, and the feedback is incorporated into the dialogue model.

[0651] Step 5:

[0652] The server uses feedback data obtained from the information analysis device to adjust and improve the parameters of the dialogue model. It analyzes the user evaluations collected as input to identify factors affecting the model's performance. As output, an improved new dialogue model is generated and provided back to the terminal. Specifically, the model training pipeline is automatically adjusted based on the analysis results, evolving into a more accurate model.

[0653] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0654] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include those described above. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions shown by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0655] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0656] [Fourth Embodiment]

[0657] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0658] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0659] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0660] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0661] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0662] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0663] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0664] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0665] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0666] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0667] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0668] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0669] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0670] The customer service humanoid system in this invention is based on a cooperative operation between a server and a terminal. The server first collects customer service data obtained from the store and uses machine learning to construct a customer response model from this data. This model analyzes customer behavior and generates appropriate responses.

[0671] The server sends this customer interaction model to a humanoid terminal, which then uses the model to interact with the customer. The terminal has speech recognition capabilities, which capture customer inquiries as voice data, convert it into text data through a subsequent natural language processing step, and generate a response through the customer interaction model. For example, if a customer asks about the availability of a product, the terminal will respond with something like, "The product in question is currently out of stock."

[0672] Feedback from customers is sent to the server via their devices. This feedback is analyzed on the server and used for subsequent model updates. Because the feedback contributes to improving customer service quality, it leads to improved overall system performance.

[0673] Thus, the present invention supports customer service operations through a series of cycles: model construction by the server, dialogue execution by the terminal, and analysis of user feedback, aiming to improve efficiency in store operations, reduce labor costs, and enhance customer satisfaction.

[0674] The following describes the processing flow.

[0675] Step 1:

[0676] The server collects customer service data from the store. This data includes voice conversation logs, chat history, customer questions, and staff responses.

[0677] Step 2:

[0678] The server preprocesses the collected data. Specifically, it removes noise from the raw data and converts audio data into text data. Through this process, it creates a well-structured dataset necessary for training machine learning models.

[0679] Step 3:

[0680] The server uses pre-processed data to build a customer interaction model through machine learning algorithms. This model learns to respond appropriately to a variety of customer question patterns.

[0681] Step 4:

[0682] The server sends the customer interaction model it has built to the terminal. This model is a crucial element that supports real-time interaction with the customer on the terminal.

[0683] Step 5:

[0684] When the terminal detects a customer's approach through the humanoid's sensors, it enters conversation mode. The speech recognition system transcribes the customer's speech into text and generates the optimal response using a model provided by the server.

[0685] Step 6:

[0686] After receiving customer service, users provide feedback via their device. This feedback can cover a wide range of topics, including the quality of service, areas for improvement, and specific requests.

[0687] Step 7:

[0688] The server collects and analyzes user feedback. This feedback is used, as needed, to retrain the customer service model, improving its accuracy. This ensures continuous improvement of the entire system.

[0689] (Example 1)

[0690] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0691] Traditional customer service systems have resulted in inconsistent and inefficient service due to the reliance on individual staff members for quality. Furthermore, the inability to effectively utilize user feedback has made it difficult to improve service quality. There is a need to address these challenges to reduce labor costs and improve customer satisfaction in store operations.

[0692] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0693] In this invention, the server includes a processing device means for collecting data and constructing a generative AI model using machine learning techniques; a communication device means for converting speech data into text data using the generative AI model provided by the processing device means, performing natural language processing, and conducting dialogue with customers; and an analysis device means for accumulating user feedback, analyzing that information, and updating the generative AI model. This improves the quality of customer service, enables consistent service delivery, and allows for model improvements that reflect user feedback.

[0694] "Data" refers to a collection of information, specifically elements gathered for the purpose of improving system functionality and analysis.

[0695] "Machine learning techniques" are technologies that learn patterns from data and enable automatic decision-making and prediction.

[0696] A "generative AI model" is an algorithmic structure that is automatically generated based on training data and has the ability to perform tasks according to a specific purpose.

[0697] A "processing device" refers to a device that performs a series of functions for collecting, analyzing, and executing a specific task.

[0698] "Communication device means" refers to a device that transmits and receives data and exchanges information with other devices or systems.

[0699] "Audio data" refers to digital audio information in a format that can be analyzed and converted by a computer.

[0700] "Natural language processing" refers to the technology that enables computers to understand, analyze, and process human language.

[0701] A "user" is an individual or organization that uses a system or service.

[0702] "Feedback" refers to opinions or information provided to improve or modify actions or processes.

[0703] An "analysis device" refers to a device that has the function of analyzing data and outputting the results as information.

[0704] The system in this invention employs a configuration in which multiple devices operate in cooperation, and mainly consists of a server, a terminal, and a user.

[0705] server

[0706] The server collects data from within the store and from other sources. This data includes user behavior logs and conversation records. The server leverages this data and uses hardware and software to build generative AI models. Specifically, it uses Python programs and machine learning libraries. Once the model is built, it is transferred from the server to the terminal and used for future interactions.

[0707] terminal

[0708] The device accepts a generative AI model provided by the server. The device is equipped with speech recognition capabilities and captures the user's spoken audio data. This includes, for example, digital signal processing techniques to reduce noise from the audio. The audio is converted into text data via a speech recognition engine, followed by natural language processing. During this process, the device interprets the customer's questions and generates appropriate responses according to the generative AI model. The text data is then converted back into speech by a speech synthesis engine.

[0709] User

[0710] Users utilize services provided through a terminal. A concrete example is the ordering process at a cafe. When a user says, "I'd like a latte," the terminal interprets the voice and responds, "One latte, understood." Users can provide feedback on the service, and the terminal sends this feedback to a server.

[0711] This feedback is analyzed on the server and used to improve the model.

[0712] As an example of a prompt, it could be input into the AI ​​model in the form of, "Generate a response based on a customer's order. For example, how would you respond if the customer said, 'I'd like one latte, please?'"

[0713] With the above configuration, the present invention achieves efficient, consistent, and high-quality customer service, contributing to improved overall system performance.

[0714] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0715] Step 1:

[0716] The server collects data from inside and outside the store using various sensors and networks. This data, including conversation logs and customer behavior patterns, is stored in the server's database. Audio and text data flow into the server as input, and a clean dataset necessary for analysis and learning is constructed as output. Specifically, preprocessing such as noise reduction and data cleaning is performed.

[0717] Step 2:

[0718] The server applies machine learning algorithms to the collected data to build a generative AI model. The clean dataset obtained in the previous step is used as input, and the output is a generative AI model for customer interaction. This model is trained using Python programs and libraries and implemented using frameworks such as TensorFlow.

[0719] Step 3:

[0720] The server sends the completed generative AI model to the terminal. The model is packaged in JSON or other data formats and transferred over the network. The input is a pre-built customer interaction model, and the output is the terminal receiving that model and importing it into its own system.

[0721] Step 4:

[0722] The device utilizes speech recognition technology to receive voice input from the user. The customer's voice is input to the device, and the output is the conversion of the voice into text. Specifically, a speech recognition engine is used, utilizing APIs such as Google Speech-to-Text.

[0723] Step 5:

[0724] The device analyzes the converted text data using natural language processing and generates an appropriate response using a generative AI model. The input is the converted text data. The output is the optimal response to be returned to the user. For example, if the user asks, "Do you have product A?", the device checks the inventory status and provides a response such as, "Product A is currently out of stock."

[0725] Step 6:

[0726] The device uses speech synthesis technology to convert text responses into speech and respond to the user. The generated text response is provided as input, and the response to the user is provided in speech format as output. Specifically, the speech synthesis engine operates, producing natural-sounding speech output.

[0727] Step 7:

[0728] Users input feedback on the provided service into the terminal. The input consists of the user's rating and comments, and the output is the recording of that feedback as digital data. Feedback content is entered through user interaction, using touch panels or voice input.

[0729] Step 8:

[0730] The terminal transfers user feedback data to the server. The input is feedback data recorded on the terminal, and the output is data sent to the server for analysis. Data communication takes place via a network interface.

[0731] Step 9:

[0732] The server analyzes the feedback it receives and uses it to improve and update the generated AI model. The server processes the feedback as input, and the output includes adjustments to the model's parameters or the addition of new behaviors. This results in higher quality customer service in the future.

[0733] (Application Example 1)

[0734] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0735] Automating customer service and responding quickly and accurately to customer inquiries within stores is necessary to improve customer satisfaction. Furthermore, using speech recognition and natural language processing is required to provide efficient customer service without increasing human resources.

[0736] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0737] In this invention, the server includes an information processing device that collects training data and constructs a customer response model based on the training data; a speech recognition device that interacts with customers using the customer response model provided by the information processing device; a data analysis device that aggregates customer feedback and updates the customer response model based on the feedback; a conversion device that converts voice input from customers into text data using speech recognition means; and a response generation device that generates a response based on the text data obtained by the speech recognition device using a generation AI model. This enables the provision of appropriate responses in real time to a variety of customer questions, improving the efficiency of customer service operations in stores and enhancing customer satisfaction.

[0738] "Training data" refers to data collected about customer behavior and responses, which is used to build customer service models.

[0739] A "customer response model" is a set of algorithms and rules that a server constructs to generate appropriate responses to customer questions and requests.

[0740] An "information processing device" is a computer device used to build customer response models using collected data.

[0741] A "voice recognition device" is a device that analyzes a customer's voice input and converts it into text data.

[0742] A "data analysis device" is a device that collects customer feedback and updates customer service models to improve customer satisfaction.

[0743] A "conversion device" is a device used to convert speech data obtained through speech recognition into text data.

[0744] A "generative AI model" is an artificial intelligence model designed to automatically generate appropriate responses based on input data.

[0745] A "response generation device" is a device that uses a generative AI model to create responses to customer questions.

[0746] A "communication system" is a system that combines these devices and functions to efficiently exchange information.

[0747] The system for realizing this invention includes an information processing device, a speech recognition device, a data analysis device, a conversion device, and a response generation device using a generative AI model. These devices work together to provide real-time customer service.

[0748] First, the server collects customer behavior data using various sensors and data collection mechanisms, and stores it as training data in an information processing device. Based on this data, a customer response model is built. The server then transmits the customer response model to a speech recognition device. This device receives voice input from the customer and sends it to a conversion device that converts it into text data using speech recognition means.

[0749] The character data generated by the conversion device is passed to a response generation device equipped with a generative AI model. This response generation device uses the generative AI model to generate an appropriate response based on the given data. This process utilizes Google Cloud Speech-to-Text and GPT-4, achieving advanced natural language processing.

[0750] The user receives the response and provides feedback as needed. The data analysis device aggregates this feedback and sends it back to the server. The server analyzes the feedback data for future model updates and adjusts the customer response model.

[0751] As a concrete example, in a real store, if a user asks a question by voice, such as "Where are the new products?", a voice recognition device converts the voice into text, and a response generation device generates a response such as "The new products are on the left side of the store." This allows the user to find the products smoothly.

[0752] An example of a prompt for a generative AI model is: "The user is asking about the current location of an item. The question is 'Where is the new product?' Please generate an answer considering the store map."

[0753] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0754] Step 1:

[0755] The server uses an information processing device to collect customer behavior data from various sensors and data collection mechanisms. This data is stored as training data, and a customer response model is built. The input is customer behavior data, and the output is the customer response model. This model analyzes the characteristics of the collected data and learns patterns to generate appropriate responses.

[0756] Step 2:

[0757] The server provides the constructed customer interaction model to the speech recognition device. The speech recognition device receives it and prepares itself. The input is the customer interaction model, and the output is the ready-to-use speech recognition device. The speech recognition device has already learned how to accurately process speech data using the received model.

[0758] Step 3:

[0759] Users ask questions in the store using voice. The speech recognition system quickly captures the voice input and converts it into text data using speech recognition technology. The input is voice data, and the output is text data. Google Cloud Speech-to-Text is used for this data conversion, achieving high accuracy in transcription.

[0760] Step 4:

[0761] The terminal's conversion device transmits the character data obtained from the speech recognition device to a response generation device equipped with a generation AI model. Because the generation AI model uses GPT-4, it generates an appropriate response based on the prompt text. The input consists of character data and the prompt text, and the output is the generated response text. Specifically, the generation AI model performs natural language processing on the obtained character data and selects the most appropriate response.

[0762] Step 5:

[0763] The user receives a response generated by a response generator. Based on this response, which is obtained in voice or text, the user decides on their actions within the store. The input is the generated response text, and the output is the user's actions. This response serves to guide the user so that they can quickly find the products they need.

[0764] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0765] This invention employs a configuration that combines a humanoid system for customer service with an emotion engine. The server unit constructs a customer interaction model using existing training data and provides it to the terminal unit. This terminal unit is installed in the store as a humanoid device and interacts with customers using speech recognition and natural language processing.

[0766] The emotion engine analyzes various sensor information, such as voice tone, facial expressions, and body movements, to recognize the user's emotions. The device uses this emotion data to provide responses that reflect the user's emotional state. For example, if the device determines that the user is angry, it will carefully choose its words and respond in a calming manner.

[0767] Users can provide feedback after receiving service, and this feedback is sent to the server. The server analyzes this feedback and uses it to improve the accuracy of the customer service model. Furthermore, sentiment analysis results obtained from the sentiment engine are also used as feedback, contributing to optimal interactions with users.

[0768] This will enable more consistent and high-quality customer service, aiming to further improve customer satisfaction. The present invention is embodied as a system to support efficient personnel utilization and appropriate communication with customers in store operations.

[0769] The following describes the processing flow.

[0770] Step 1:

[0771] The server aggregates customer service data sent from each terminal within the store. This data includes information such as voice logs, conversation content, and customer reactions.

[0772] Step 2:

[0773] The server preprocesses the data and generates a dataset organized for analysis. This involves converting audio data to text, and performing noise reduction and data normalization.

[0774] Step 3:

[0775] The server trains a machine learning model using pre-processed data. This model learns appropriate response patterns in response to customer requests and accumulates the knowledge necessary for customer service.

[0776] Step 4:

[0777] The server sends a pre-trained customer interaction model to the terminal. This model forms the foundation for real-time customer support.

[0778] Step 5:

[0779] When the device detects a customer, it initiates speech recognition and natural language processing to interpret the customer's question. For example, if a question about a product comes in, it retrieves relevant answer data from the model.

[0780] Step 6:

[0781] The device uses an emotion engine to analyze the customer's emotions. It analyzes voice tone, facial expressions, and gestures in real time to identify the user's emotional state.

[0782] Step 7:

[0783] The device adjusts the tone and content of its responses according to the customer's emotions to provide appropriate communication. For example, if a user is confused, it will provide detailed explanations to reassure them.

[0784] Step 8:

[0785] After receiving customer service, users enter feedback into a terminal. This feedback is used to improve the accuracy of the system.

[0786] Step 9:

[0787] The server aggregates feedback and sentiment analysis results sent from the terminals and uses them to improve the customer service model. Based on these results, the model parameters are adjusted to improve the accuracy of responses in subsequent interactions.

[0788] (Example 2)

[0789] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0790] Modern customer service systems face challenges such as limited interaction with customers and difficulty in responding in a way that considers the individual customer's emotions. In particular, there is a need to appropriately recognize the customer's emotional state and respond accordingly. Furthermore, traditional systems lacked mechanisms to effectively utilize customer feedback to optimize the model, making it difficult to consistently improve service quality.

[0791] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0792] In this invention, the server includes information processing means for collecting training data and constructing a customer response model based on said training data; an emotion analysis device for analyzing voice tone, facial expressions, and body movements to generate emotion data; and an analysis device for aggregating customer feedback and updating the customer response model based on said feedback and emotion data. This enables sophisticated dialogue that takes customer emotions into consideration and consistent improvement in the quality of service.

[0793] "Training data" refers to a set of data collected as foundational information for building a customer service model.

[0794] "Information processing means" refers to a device or system for generating and providing customer response models using training data.

[0795] A "dialogue device" refers to equipment installed for interacting with customers, which uses a dialogue model to generate real-time responses in voice or text.

[0796] An "emotion analysis device" refers to a device that analyzes a user's emotions based on information such as voice tone, facial expressions, and body movements, and generates emotional data.

[0797] An "analysis device" refers to a device or system used to update and optimize customer service models based on customer feedback and emotional data.

[0798] A "customer response model" is a model designed to generate appropriate responses in customer interactions, and is built from training data.

[0799] "Feedback" refers to information such as evaluations and impressions that customers provide after experiencing a service.

[0800] This invention aims to enhance customer interaction in customer service systems and improve the customer experience. It primarily focuses on three parties: the server, the terminal, and the user, and is implemented as follows.

[0801] The server aggregates training data and uses information processing tools to build a customer response model. This training data includes past dialogue records and user feedback data. Based on this, the server utilizes generative AI models, commonly used in natural language processing, to form a model that acts as a response generation module. The constructed model is then provided to the terminal and used for on-the-spot customer interactions.

[0802] The terminal is a humanoid device installed in stores that directly interacts with customers. It uses well-known voice recognition software, for example, converting and analyzing customer speech via a commonly used natural language processing API. Furthermore, it incorporates an emotion analyzer, using a camera and microphone to analyze voice tone, facial expressions, and body movements to determine the customer's emotions. This allows the dialogue device to generate responses tailored to the customer's emotions, improving the accuracy of the conversation.

[0803] Users provide feedback after interacting with the system. This feedback information is sent to the server via the terminal. The server analyzes this feedback information and data obtained from sentiment analysis to continuously improve the model. This enables the system to provide more appropriate responses that better meet customer needs and emotions in subsequent interactions.

[0804] For example, if a user asks "What are the recommended items?" in a store, the terminal recognizes the question and responds using a stored model, "Today's special is a smoothie. Would you like to try one?" At the same time, it performs emotion analysis based on the user's facial expressions and voice, and can also provide additional input depending on the situation. Furthermore, by utilizing a generative AI model and inputting prompts such as "Show me an example of a response when the user is satisfied," it can explore a wider variety of flexible response patterns.

[0805] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0806] Step 1:

[0807] The server collects training data.

[0808] Input: Past dialogue logs and user feedback data

[0809] Specific operation: The server accesses the storage system and selectively retrieves interaction history and feedback.

[0810] Output: The set of training data required to build the model.

[0811] Step 2:

[0812] The server builds a customer response model based on the collected training data.

[0813] Input: Set of training data

[0814] Specific operation: The server uses natural language processing algorithms to analyze data and train a generative AI model.

[0815] Output: Newly constructed customer service model

[0816] Step 3:

[0817] The server provides the established customer support model to the terminal.

[0818] Input: Customer support model

[0819] Specific operation: The server transfers the model to the terminal via a secure communication protocol.

[0820] Output: Latest customer support model installed on the device

[0821] Step 4:

[0822] The terminal receives voice input from the customer.

[0823] Input: Customer voice

[0824] Specific operation: The device's microphone captures the customer's voice, and speech recognition software converts it to text.

[0825] Output: Customer questions or requests in text format

[0826] Step 5:

[0827] The terminal analyzes customer input and generates a response.

[0828] Input: Customer question or request in text format

[0829] Specific operation: The terminal uses a customer interaction model to analyze text and create an appropriate response. Simultaneously, an emotion analyzer analyzes voice tone and facial expressions, and incorporates the results into the response.

[0830] Output: Response message to present to the customer

[0831] Step 6:

[0832] Users provide feedback after the interaction.

[0833] Input: Feedback information (evaluation, suggestions for improvement, etc.)

[0834] Specific operation: The user inputs feedback using the device's interface.

[0835] Output: Feedback data sent to the server

[0836] Step 7:

[0837] The server receives feedback and analyzes it to improve the model.

[0838] Input: Feedback data and sentiment analysis results

[0839] Specific operation: The server analyzes the new feedback data and retrains the customer interaction model by appropriately adjusting the model's parameters.

[0840] Output: Improved customer service model

[0841] (Application Example 2)

[0842] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0843] Traditional customer service systems had the challenge of not being able to accurately recognize customer emotions and provide appropriate responses immediately. This made it difficult to improve customer satisfaction and limited the efficient use of personnel. Furthermore, the models were not sufficiently improved using feedback, creating a need for measures to improve service quality.

[0844] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0845] In this invention, the server includes an information processing device that collects learning data and constructs a dialogue model based on the learning data; a device that performs dialogue with a human using the dialogue model provided by the information processing device; a sensor analysis device that interprets the user's emotional state and adjusts the response based on the interpretation result; and an information analysis device that aggregates evaluations from the user and updates the dialogue model based on the evaluation. This makes it possible to understand the user's emotions in real time, provide appropriate customer service, and improve customer satisfaction.

[0846] An "information processing device" is a device that collects training data and builds a dialogue model based on that data.

[0847] A "device" is a device that interacts with humans using a dialogue model provided by an information processing device.

[0848] A "sensor analysis device" is a device that interprets the user's emotional state and adjusts its response based on that interpretation.

[0849] An "information analysis device" is a device that collects user feedback and updates the dialogue model based on that feedback.

[0850] A "dialogue model" is a model used to understand conversations with users and generate appropriate responses.

[0851] To implement this invention, an information processing device, a device device, a sensor analysis device, and a system using the information analysis device are required. First, the server collects diverse learning data through the information processing device and constructs a dialogue model based on it. This model is a generative AI model that enables natural conversation with humans.

[0852] Next, terminal devices are deployed in stores and service locations and interact with customers using dialogue models acquired from information processing devices. They interpret customer questions using speech recognition systems (e.g., Google Cloud Speech-to-Text) and natural language processing engines (e.g., OpenAI GPT-3). Meanwhile, sensor analysis devices utilize emotion analysis engines (e.g., Microsoft Azure Emotion Recognition) to analyze the customer's emotional state in real time from their voice and facial expressions and reflect this in their responses.

[0853] Furthermore, user feedback is aggregated by an information analysis device. This device adjusts the parameters of the dialogue model based on the collected feedback to improve the model's performance. For example, in a scenario where a customer is unsure which smartphone to buy, the server uses an emotion engine to analyze emotional data indicating anxiety and generates a reassuring response to the device, such as, "There are many options available, let's find the best one together."

[0854] As an example of a prompt, the AI ​​generation model would be input in the format of, "A customer is hesitant about purchasing a smartphone. Please generate a reassuring customer service message." In this way, a system is created that can accurately reflect the user's emotional state, contributing to improved customer satisfaction.

[0855] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0856] Step 1:

[0857] The server uses an information processing device to collect training data and builds a dialogue model based on it. It receives large amounts of text data and past customer interaction records as input, and uses a generative AI model to output a dialogue model suitable for natural language processing tasks. Specifically, it performs data cleansing, converts the data into a format suitable for the model, and then trains the AI ​​model.

[0858] Step 2:

[0859] The terminal device receives a dialogue model provided by the server and utilizes it during customer service. It receives voice data from the customer as input and converts it to text using Google Cloud Speech-to-Text. As output, this text data is passed to a natural language processing engine such as OpenAI GPT-3 to generate appropriate response text. Specifically, the generated text is played back by a speech synthesizer to respond to the user.

[0860] Step 3:

[0861] A sensor analysis device analyzes customer emotions in real time. It collects customer voice tone, facial expressions, and gestures from cameras and microphones as input, and outputs emotion data using Microsoft Azure Emotion Recognition. Specifically, it extracts features from audio and video data and estimates emotions based on them. This output data is returned to the device and used to adjust responses.

[0862] Step 4:

[0863] After interacting with a customer, the user enters feedback into a terminal. The input information includes text and evaluation scores, which are collected by an information analysis device. The feedback is then sent to a server and used to adjust the model's parameters. Specifically, the feedback data is accumulated, analyzed periodically, and the feedback is incorporated into the dialogue model.

[0864] Step 5:

[0865] The server uses feedback data obtained from the information analysis device to adjust and improve the parameters of the dialogue model. It analyzes the user evaluations collected as input to identify factors affecting the model's performance. As output, an improved new dialogue model is generated and provided back to the terminal. Specifically, the model training pipeline is automatically adjusted based on the analysis results, evolving into a more accurate model.

[0866] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0867] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include those described above. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions shown by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0868] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0869] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0870] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0871] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0872] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0873] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0874] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0875] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0876] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0877] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0878] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0879] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0880] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0881] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0882] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0883] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0884] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0885] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0886] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0887] The following is further disclosed regarding the embodiments described above.

[0888] (Claim 1)

[0889] A server means for collecting training data and building a customer response model based on said training data,

[0890] A terminal means that interacts with customers using a customer interaction model provided by the server means,

[0891] An analytical means for aggregating customer feedback and updating the customer response model based on that feedback,

[0892] A system that includes this.

[0893] (Claim 2)

[0894] The system according to claim 1, wherein the terminal means interprets customer questions using speech recognition and natural language processing.

[0895] (Claim 3)

[0896] The system according to claim 1, wherein the analysis means adjusts the model parameters based on the feedback analysis results.

[0897] "Example 1"

[0898] (Claim 1)

[0899] A processing device that collects data and constructs a generative AI model using machine learning techniques,

[0900] A communication device means that uses a generation AI model provided by the processing device to convert voice data into text data, performs natural language processing, and conducts dialogue with the customer.

[0901] An analysis device means that collects user feedback, analyzes that information, and updates the generated AI model,

[0902] A system that includes this.

[0903] (Claim 2)

[0904] The system according to claim 1, wherein the communication device means determines the user's request using voice recognition and natural language processing.

[0905] (Claim 3)

[0906] The system according to claim 1, wherein the analysis device means adjusts the characteristics of the model based on the analysis results of the feedback.

[0907] "Application Example 1"

[0908] (Claim 1)

[0909] An information processing device that collects training data and constructs a customer response model based on said training data,

[0910] A speech recognition device that interacts with customers using a customer interaction model provided by the information processing device,

[0911] A data analysis device that collects customer feedback and updates the customer response model based on that feedback,

[0912] A conversion device that uses speech recognition means to convert voice input from customers into text data,

[0913] A response generation device that generates a response based on text data obtained by a speech recognition device using a generative AI model,

[0914] A communication system that includes this.

[0915] (Claim 2)

[0916] The communication system according to claim 1, wherein the response generation device generates a response in a physical store with location information added based on a customer's question.

[0917] (Claim 3)

[0918] The communication system according to claim 1, wherein the data analysis device adjusts the parameters of the generated AI model based on the feedback analysis results.

[0919] "Example 2 of combining an emotion engine"

[0920] (Claim 1)

[0921] Information processing means for collecting training data and constructing a customer response model based on said training data,

[0922] A dialogue device that interacts with customers using a customer interaction model provided by the information processing means,

[0923] An emotion analysis device that analyzes voice tone, facial expressions, and body movements to generate emotion data,

[0924] An analysis device that collects customer feedback and updates the customer response model based on said feedback and sentiment data,

[0925] A system that includes this.

[0926] (Claim 2)

[0927] The system according to claim 1, wherein the dialogue device interprets customer questions using speech recognition and natural language processing and generates a response using emotion data obtained from an emotion analysis device.

[0928] (Claim 3)

[0929] The system according to claim 1, wherein the analysis device adjusts the model parameters based on the feedback analysis results and emotion data.

[0930] "Application example 2 when combining with an emotional engine"

[0931] (Claim 1)

[0932] An information processing device that collects training data and constructs a dialogue model based on said training data,

[0933] A device that performs dialogue with a human using a dialogue model provided by the information processing device,

[0934] A sensor analysis device that interprets the user's emotional state and adjusts its response based on the interpretation results,

[0935] An information analysis device that collects user feedback and updates the dialogue model based on that feedback,

[0936] A system that includes this.

[0937] (Claim 2)

[0938] The system according to claim 1, wherein the device interprets the user's question using speech recognition and natural language processing.

[0939] (Claim 3)

[0940] The system according to claim 1, wherein the information analysis device adjusts the model parameters based on the evaluation analysis results. [Explanation of symbols]

[0941] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. An information processing device that collects training data and constructs a customer response model based on said training data, A speech recognition device that interacts with customers using a customer interaction model provided by the information processing device, A data analysis device that collects customer feedback and updates the customer response model based on that feedback, A conversion device that uses speech recognition means to convert voice input from customers into text data, A response generation device that generates a response based on text data obtained by a speech recognition device using a generative AI model, A communication system that includes this.

2. The communication system according to claim 1, wherein the response generation device generates a response in a physical store with location information added based on a customer's question.

3. The communication system according to claim 1, wherein the data analysis device adjusts the parameters of the generated AI model based on the feedback analysis results.