system

The system addresses the challenge of real-time customer emotion analysis in sales presentations by using facial and voice data to provide immediate feedback and data-driven suggestions, enhancing sales efficiency and satisfaction.

JP2026105372APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-16
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Conventional sales presentations face challenges in accurately grasping customer emotions and psychological reactions in real time, leading to insufficient customer need extraction and suboptimal communication, which affects sales efficiency and satisfaction.

Method used

A system that collects customer facial expressions and voice data using cameras and microphones, analyzes this data in real time using emotion recognition and natural language processing models, and provides immediate feedback to sales representatives, while also generating suggestions for improving future presentations based on past data analysis.

Benefits of technology

Enhances sales efficiency and customer satisfaction by enabling real-time emotional state analysis and feedback, optimizing communication strategies, and improving presentation quality through data-driven insights.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026105372000001_ABST
    Figure 2026105372000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] Information gathering methods for acquiring customer facial expressions and voice information, An analytical method that determines the emotional state of customers in real time based on the collected information, A means of providing information to sales staff based on the judgment result, A generation means that uses the information determined by the analysis means to create proposals for improving future customer service, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In conventional sales presentations, it has been difficult to appropriately grasp the emotions and psychological reactions of customers in real time, and salespersons have not been able to sufficiently draw out the true needs of customers and optimize communication. Also, for new or newly assigned salespersons, improvement of customer response skills has been required, but the support has been insufficient, which may lead to a decrease in sales efficiency and customer satisfaction.

Means for Solving the Problems

[0005] This invention accurately grasps customer emotions by collecting data on customer facial expressions and voice using data collection means for analyzing customer emotional states, and evaluating this data in real time using analysis means. Furthermore, it provides real-time feedback to sales representatives through feedback provision means based on the evaluation results, supporting the optimization of communication. Additionally, by analyzing the acquired data using generation means and generating optimized suggestions for the next presentation, it can improve the efficiency of sales activities and customer satisfaction. Moreover, by providing customer-specific approach methods in advance using prediction means that analyze past data, it enables sales representatives to prepare presentations more effectively.

[0006] "Data collection means" refers to a device or method that has the function of acquiring information on a customer's facial expressions and voice, and transmitting the raw data for analysis to a server or database.

[0007] "Analysis means" refers to models and software used to evaluate emotional states and psychological responses in real time using collected customer data.

[0008] A "feedback provision system" is a system that provides immediate advice and information to sales representatives based on analysis results, thereby promoting improved communication.

[0009] The "generation mechanism" is a system for generating improvement suggestions and optimized approaches for the next sales presentation based on insights gained from the analyzed data.

[0010] A "predictive tool" is a method or device for analyzing past presentation data and, based on that analysis, proposing an effective approach for a specific customer in advance.

[0011] A "natural language processing model" refers to algorithms and technologies used to analyze human language and understand its meaning and intent.

[0012] An "emotion recognition model" refers to technologies and algorithms that automatically estimate and recognize a customer's emotions from their voice and facial expressions. [Brief explanation of the drawing]

[0013] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.

Embodiments for Carrying Out the Invention

[0014] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0019] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] This invention is a system for building an AI agent to optimize customer service, aiming to analyze customer emotions and reactions in real time and provide feedback to sales representatives. Specific embodiments for carrying out the invention are shown below.

[0035] First, the terminal collects data on the customer's facial expressions and voice. This is done using cameras and microphones installed in the conference room. The collected data is automatically sent to the server.

[0036] The server uses emotion recognition models and natural language processing models to analyze the received data. For example, it identifies emotional patterns from the customer's facial expressions and tone of voice, and uses this to estimate the customer's current psychological state.

[0037] Based on the analysis results, the server generates feedback and sends it to the terminal. The terminal displays this feedback to the sales representative in real time. For example, if it is highly likely that a customer has questions or concerns, a message such as "It's time to ask questions" will be displayed on the screen to inform the sales representative.

[0038] After the presentation ends, the server performs a detailed analysis of all the collected data. This identifies areas for improvement in the next presentation and provides users with specific suggestions through a generation mechanism. For example, feedback might include, "Strengthening the explanation of technical features can increase customer interest."

[0039] Furthermore, the server uses predictive tools based on past presentation data to proactively suggest effective approaches for specific customers. In this way, sales representatives can prepare their presentations more effectively.

[0040] Embodiments of this invention provide real-time feedback and data-driven insights to streamline sales activities and improve customer satisfaction.

[0041] The following describes the processing flow.

[0042] Step 1:

[0043] As the presentation begins, the device activates the conference room's camera and microphone, continuously collecting customer facial expressions and audio data. The collected data is instantly transmitted to the server.

[0044] Step 2:

[0045] To analyze the data received by the server in real time, emotion recognition models and natural language processing models are used. Specifically, muscle movements and changes in gaze are analyzed from facial expression data, and voice tone and intonation are analyzed from audio data to estimate the customer's emotions and psychological state.

[0046] Step 3:

[0047] Based on the analysis results, the server generates specific feedback that the sales representative should take. For example, if the customer shows a questioning expression, it will generate feedback such as, "You should add an additional explanation on this slide."

[0048] Step 4:

[0049] The terminal displays feedback received from the server on the sales representative's screen in real time. This allows the sales representative to communicate with the customer in a way that is appropriate to their immediate response.

[0050] Step 5:

[0051] After the presentation ends, the server re-analyzes all the collected data in detail and generates suggestions for improvement for the next presentation. For example, it might provide specific advice such as, "Strengthening the explanation on slide 5 will improve comprehension."

[0052] Step 6:

[0053] Users review reports provided by the server and prepare to incorporate them into their next presentation strategy. This improves the quality of sales activities.

[0054] (Example 1)

[0055] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0056] In customer interactions, it is crucial to understand their emotional state in real time and respond appropriately immediately. However, traditional methods require significant time and effort to accurately analyze a customer's emotional state, making real-time feedback difficult. Furthermore, it is difficult to fully utilize past information, making it challenging to take the optimal approach for specific customers.

[0057] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0058] In this invention, the server includes means for acquiring information, means for analyzing emotions in real time based on the acquired information, means for providing interaction to the person in charge based on the judgment result, and means for visually displaying the information on-site. This makes it possible to analyze the emotional state of customers in real time and provide prompt and appropriate feedback. Furthermore, by utilizing prediction means based on past information, it becomes possible to present the optimal approach method for a specific customer in advance.

[0059] "Means of acquiring information" refers to devices and systems that collect unstructured data such as customers' facial expressions and voices.

[0060] "Analysis methods" refer to processes and devices that use emotion recognition and natural language processing technologies to determine emotional states based on collected information.

[0061] "Means of providing interaction" refers to a function that generates and provides feedback to the person in charge to encourage appropriate action based on the analyzed results.

[0062] "Means of visually displaying information on-site" refers to devices and programs that visually display information in real time via terminals and provide visual feedback.

[0063] "Predictive tools" refer to algorithms and technologies that analyze past data and propose the optimal approach for a particular customer in advance.

[0064] This invention is a system for optimizing customer service, which analyzes the customer's emotional state in real time and provides appropriate feedback to the person in charge. Specific embodiments for carrying out this invention are described below.

[0065] The terminal collects data on customers' facial expressions and voices using high-precision devices installed in the conference room, such as high-resolution cameras and high-performance microphones. To give specific product names, this would include general-purpose cameras and microphones. The terminal immediately transmits this unstructured data to the server.

[0066] The server receives the collected data and analyzes the customer's emotional state using an emotion recognition model and a natural language processing model. Existing technologies such as TENSORFLOW® and OpenCV can be used in this process. Using these technologies, the server identifies the customer's facial expressions and tone of voice, and then identifies their emotional patterns.

[0067] Based on the analyzed data, the server generates feedback to provide interaction with the person in charge. This feedback is customized according to the user's needs and the customer's psychological state. For example, if the customer seems likely to ask a question, a visual alert such as "It's time to ask a question" is generated.

[0068] The terminal displays the generated feedback to the person in charge in real time, prompting a quick response. This can improve customer satisfaction and increase the success rate of business deals.

[0069] Furthermore, users can utilize the predictive capabilities provided by the server. The server analyzes past presentation data to predict and proactively present the most effective approach for a particular customer.

[0070] For example, if the system receives a prompt message instructing the AI ​​model to "analyze customer emotions based on facial expression data and generate specific suggestions for improving sales strategies," the system can provide the user with specific advice on what to do in real time.

[0071] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0072] Step 1:

[0073] The terminal uses its installed camera and microphone to capture the customer's facial expressions and voice with high precision during the meeting. This input data is collected in real time as unstructured data. The terminal then transmits this data to the server.

[0074] Step 2:

[0075] The server receives unstructured data sent from the terminal and performs analysis using emotion recognition models and natural language processing models. The input data is labeled based on the customer's facial features and voice tone. For example, the server uses machine learning algorithms to identify emotional states such as smiles and anxiety for facial analysis. The analysis results output data that shows the customer's psychological state and reactions.

[0076] Step 3:

[0077] The server generates appropriate feedback messages based on the analysis results. Utilizing a generation AI model, it calculates interactions appropriate to the customer's emotional state and creates feedback using prompts and other elements. These prompts include specific content such as, "The customer's interest is increasing; please add more details." The generated feedback message is then sent from the server to the terminal.

[0078] Step 4:

[0079] The terminal visualizes feedback messages received from the server on the employee's display in real time. Sales representatives can then adjust their interactions with customers based on the feedback displayed as a pop-up on their tablet screen, for example. This allows for quick and appropriate responses to customers.

[0080] Step 5:

[0081] The server re-analyzes all data after the meeting to identify areas for improvement in the next presentation. It uses current and past presentation data as input, analyzing it with statistical models and machine learning techniques. For example, it identifies areas where questions are concentrated and topics of interest, outputting information that suggests strategies for the next presentation.

[0082] Step 6:

[0083] Users can receive next-step approach suggestions based on predictions generated by the server. A predictive model combining historical data and current analysis results provides information to proactively prepare effective messages and actions for specific customers. For example, feedback may be output pointing out areas that need further explanation, facilitating preparation for the next presentation.

[0084] (Application Example 1)

[0085] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0086] Traditional in-store customer service presents challenges, such as the difficulty sales staff have in understanding customer emotions and interests in real time, making it difficult to suggest services and products at the appropriate time. Furthermore, the lack of mechanisms to effectively utilize past interaction data to improve future interactions limits the improvement in customer satisfaction.

[0087] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0088] In this invention, the server includes information gathering means for acquiring customer facial expressions and voice information, analysis means for determining the customer's emotional state in real time based on the collected information, and presentation means for providing information to sales staff based on the determination results. This enables sales staff to immediately grasp the customer's emotions and take appropriate action. Furthermore, by utilizing generation means for creating suggestions to improve future customer service, efficient and effective service delivery is realized.

[0089] "Information gathering means" refers to a combination of hardware and software used to acquire customer facial expressions and voice information and to use this information for analysis.

[0090] "Analysis tools" refer to software models and algorithms used to determine a customer's emotional state in real time based on collected information.

[0091] A "presentation means" refers to a device or interface for presenting the judgment results derived from the analysis means to sales staff in real time.

[0092] "Generation means" refers to a system or program that creates suggestions for improving future customer service based on past and present customer information.

[0093] The system implementing this invention collects and analyzes customer facial expressions and voice information in real time, enabling sales staff to respond immediately. The central server of the system integrates multiple means for analyzing the customer's emotional state.

[0094] The server first acquires customer facial expressions and voice data using information gathering tools. This information gathering is done using cameras and microphones installed in the physical store. The collected data is transmitted to the server and stored. As an analysis tool, natural language processing models and emotion recognition models are implemented on the server to immediately analyze the received data and determine the customer's emotional state.

[0095] Furthermore, the server displays the analysis results in real time on the sales staff's smartphones or smart glasses via a display device. This display allows the staff to develop sales strategies that are immediately tailored to the customer's current situation.

[0096] Furthermore, by utilizing data generation methods to analyze past customer interaction data and proposing future customer interaction strategies, we aim to improve the quality of our services.

[0097] For example, if a customer visits a store showing interest in a new product, the server could automatically display feedback on the staff member's device such as, "You seem very interested; please explain the details." This would allow the staff to respond appropriately and increase customer satisfaction.

[0098] An example of a prompt for a generative AI model might be: "If a customer shows interest in a particular product, analyze their emotional state and provide appropriate feedback."

[0099] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0100] Step 1:

[0101] The terminal uses the store's cameras and microphones to collect customer facial expressions and audio data in real time. This input includes images of the customer's face and audio clips. The terminal sends this data to a server.

[0102] Step 2:

[0103] The server processes the received facial expression and audio data using analysis tools. This analysis includes using an emotion recognition model to identify emotional patterns from the customer's facial expressions. Based on the input data, the server determines the customer's emotional state (e.g., interest, suspicion) and outputs an emotional state category.

[0104] Step 3:

[0105] The server uses a natural language processing model to analyze what the customer is saying. This process involves converting the audio data into text, followed by semantic analysis. Further emotions and intentions are inferred from the customer's tone of voice and word choices, and this information is output in text format.

[0106] Step 4:

[0107] The server generates specific feedback for sales staff based on the analysis results. Using a generative AI model, it calculates the optimal response from past data and outputs messages such as, "The customer is showing interest, please begin providing a detailed explanation."

[0108] Step 5:

[0109] The server sends the generated feedback to the terminal via a presentation device. The terminal displays this feedback in real time on the smartphone or smart glasses worn by the sales staff. Examples of prompt messages include: "If the customer shows interest in a particular product, analyze their emotional state and provide appropriate feedback."

[0110] Step 6:

[0111] Sales staff, who are also users of the device, receive feedback from the terminal and take appropriate action based on that feedback. This allows for personalized service to be provided to customers, improving their satisfaction.

[0112] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0113] This invention is a system that analyzes the emotional state of customers and users and provides optimal real-time feedback to sales representatives. In particular, by combining it with an emotion engine, it aims to recognize the user's own emotions and optimize two-way communication. The following describes specific embodiments for carrying out the invention.

[0114] The terminal first collects customer facial and voice data. This data is acquired using cameras and microphones installed in the conference room and then transmitted to a server. Simultaneously, the terminal uses an emotion engine to capture the user's facial expressions and voice tone to understand their emotional state and generate data for analysis.

[0115] The server analyzes received customer and user data using emotion recognition models and natural language processing models. It reads emotions and psychological states from customer data and evaluates the sales representative's own performance from user data. This allows for the generation of feedback based on the emotions of both parties.

[0116] Based on the analysis results, the server generates and sends feedback to the terminal. This feedback provides sales representatives with specific actions to facilitate communication with customers. For example, if a customer appears anxious, the terminal screen will display instructions such as, "Please provide additional information here." Conversely, if the user is judged to be stressed, advice such as, "Take a deep breath and relax," will also be displayed.

[0117] Once the presentation is complete, the server reviews all relevant data again and generates specific improvement suggestions for the next presentation. Users can use this information to improve their presentation skills.

[0118] This invention provides a system that maximizes the effectiveness of sales activities and improves customer and user satisfaction by analyzing the emotions of both customers and users.

[0119] The following describes the processing flow.

[0120] Step 1:

[0121] At the start of the presentation, the device activates the camera and microphone installed in the conference room to collect customer facial and audio data. Simultaneously, it uses an emotion engine to capture the user's facial expressions and voice tone.

[0122] Step 2:

[0123] The server analyzes customer and user data received in real time from terminals using emotion recognition models and natural language processing models, respectively. It evaluates the customer's emotions and psychological state, as well as the user's psychological state and level of tension.

[0124] Step 3:

[0125] Based on the analysis results, the server generates feedback optimized for each customer and user. If the customer shows interest, it will generate instructions such as "Please explain in more detail on the next slide," and if the user shows signs of tension, it will generate "advice to help you relax."

[0126] Step 4:

[0127] The terminal receives feedback from the server and displays it in real time on the user's screen (the sales representative). Based on this feedback, the user can dynamically adjust their approach to customers.

[0128] Step 5:

[0129] After the presentation ends, the server uses all the saved data to generate specific improvement suggestions for the next presentation. For example, it might suggest, "Improving the introduction will make it easier to capture the audience's attention."

[0130] Step 6:

[0131] Users review reports provided by the server and use them to prepare for their next presentation. This process is expected to improve users' presentation skills and effectiveness.

[0132] (Example 2)

[0133] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0134] To ensure smooth communication during information provision and negotiation processes in business activities, it is crucial to accurately understand the emotional state of buyers and users and provide appropriate feedback. However, conventional technologies have limitations in their ability to analyze the emotions of users and buyers in real time and bidirectionally, and to generate effective feedback and optimization suggestions based on that analysis.

[0135] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0136] In this invention, the server includes information gathering means for analyzing the emotional state of users and buyers, analysis means for evaluating the emotions and psychological state of buyers in real time based on the collected information, and feedback providing means for presenting feedback to users based on the evaluation results and generated analysis data. This enables users to accurately grasp the emotional state of buyers and take appropriate action in real time.

[0137] A "user" is an entity that operates and uses the system to provide information to customers.

[0138] A "buyer" is someone who receives information and is likely to make purchasing decisions regarding companies and products.

[0139] "Emotional state" refers to the psychological state or emotions perceived through an individual's facial expressions and voice.

[0140] "Information gathering means" refers to devices and functions for acquiring data such as facial expressions and voices of users and purchasers.

[0141] "Analysis methods" refer to processes and technologies used to evaluate the emotional state of buyers and users based on collected data.

[0142] A "feedback provision mechanism" is a function that provides users with appropriate actions and improvement suggestions based on the analysis results.

[0143] The "generation method" refers to a function that uses the analysis results to generate improvement suggestions that will be useful for future information provision activities.

[0144] A "generated language processing model" is a data processing technique that has been pre-built for the purpose of analyzing natural language.

[0145] An "emotion recognition model" is an algorithm and technology used to identify an individual's emotions and psychological state from collected data.

[0146] This invention is a system for optimizing communication between users and customers in business activities, providing real-time feedback using emotion recognition technology and language processing technology.

[0147] The terminal uses cameras and microphones installed in the conference room to collect facial and voice data from customers. This data is converted into a digital format and transmitted to a server via a secure communication protocol. The terminal is equipped with an emotion engine that generates analyzable data, including the user's reactions and tone of voice.

[0148] The server applies an emotion recognition model and a generated language processing model to the received data to analyze the buyer's emotional and psychological state. Based on the analysis, it generates specific feedback on how the user should change their approach. This feedback is returned to the device in real time and displayed on the user's screen as appropriate actions and advice.

[0149] For example, if a buyer expresses concern about the information, the server generates and sends feedback such as, "Emphatically emphasize the product's safety here." If the user is nervous, advice such as, "Calm down and speak slowly again," is provided.

[0150] Furthermore, after the meeting concludes, the server generates specific suggestions for improving the quality of the next presentation based on all the analyzed data. This allows users to effectively improve their own performance.

[0151] Examples of prompts include, "What additional information should I provide to alleviate the customer's anxiety?" and "Please tell me some specific ways to calm myself down when I'm feeling nervous."

[0152] This system allows users to facilitate communication with buyers and provide optimal information.

[0153] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0154] Step 1:

[0155] The terminal collects facial and voice data from buyers via cameras and microphones installed in the conference room. This input data is converted into numerical data and stored in a database. Preprocessing, such as noise reduction and feature point extraction for faces and voices, is performed to improve data accuracy. The output is analyzable digital data.

[0156] Step 2:

[0157] The device uses an emotion engine to generate data for analyzing the user's facial expressions and voice tone. This step involves reaction capture and voice tone analysis to quantify the user's emotional state. The input is information obtained from the device's camera and microphone, and the output is numerical data indicating the user's psychological state.

[0158] Step 3:

[0159] The server receives purchaser and user data transmitted from the terminal and analyzes the data using an emotion recognition model and a generated language processing model. The input is numerical data transmitted from Step 1 and Step 2, and the data calculations involve identifying emotional states and evaluating psychological states. The output is the emotion analysis result.

[0160] Step 4:

[0161] The server generates feedback based on the emotional state of the buyer and user, using the analysis results. It utilizes a generative AI model to determine a communication strategy appropriate to the buyer's reaction. The input is the analysis results from step 3, and the output is the generated feedback instructions.

[0162] Step 5:

[0163] The terminal receives feedback sent from the server and displays it to the user in real time. Specifically, it displays feedback messages to help the user respond immediately. The input is the feedback from step 4, and the output is the action item displayed on the user's screen.

[0164] Step 6:

[0165] The server reanalyzes all relevant data after the meeting ends and generates specific improvement suggestions for the next presentation. The input is the entire presentation data, and the data calculations include comparative analysis with past data. The output is the points for improvement that can be used next time.

[0166] (Application Example 2)

[0167] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0168] In modern face-to-face sales, accurately understanding customer emotions and providing appropriate service is essential, but it is a difficult challenge for salespeople to instantly understand customer emotions and take appropriate action. Furthermore, there is a lack of mechanisms for salespeople to receive real-time feedback necessary to improve their customer service skills, making it time-consuming for individuals to improve their skills.

[0169] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0170] In this invention, the server includes data collection means for analyzing the customer's emotional state, analysis means for evaluating the customer's emotions in real time based on the collected data, feedback provision means for providing feedback to the salesperson based on the evaluation results, and presentation means for providing the customer's emotion analysis results as visual feedback to a glasses-type device worn by the salesperson. This enables accurate customer service in line with the customer's emotions and supports the improvement of the salesperson's customer service skills.

[0171] "Customer emotional state" refers to the psychological or emotional state that can be extracted from a customer's facial expressions, tone of voice, choice of words, and other factors.

[0172] "Data collection methods" refer to devices and technologies used to collect information necessary to understand the emotional state of customers.

[0173] "Analysis means" refers to computational or processing techniques used to evaluate customer emotions based on collected data.

[0174] "Feedback provision means" refers to devices or systems that provide information to sales staff based on analysis results to encourage appropriate responses and actions.

[0175] "Generation means" refers to devices or technologies that have the function of generating suggestions for improving future face-to-face sales activities based on data acquired by analysis means.

[0176] "Presentation means" refers to technology or equipment for visually displaying analysis results on a device worn by a salesperson.

[0177] A "glasses-type device" refers to an electronic device in the form of glasses that salespeople wear, allowing them to visually confirm analysis results and feedback.

[0178] The system for implementing this invention aims to analyze customer emotions and provide real-time feedback to sales staff.

[0179] The server first receives facial expression and voice data from customers collected using cameras and microphones. This data is acquired in the initial stages by devices such as smart glasses and smartphones. The server uses this data to analyze the customer's emotional state using emotion recognition technology powered by TensorFlow and Google Cloud's natural language processing API.

[0180] The analysis results are generated as real-time feedback and sent immediately to a glasses-type device worn by the salesperson, for example, using the Twilio API. The feedback provided includes information that allows the salesperson to instantly adjust their actions. For example, if the customer appears confused, instructions such as "Encourage questions in a calm tone" might appear on the glasses.

[0181] As a concrete example, let's consider a clothing store setting. When a customer makes a troubled expression while trying on clothes, the system analyzes that expression and provides feedback to the sales staff, such as, "Please ask about the fit of the size." This can improve the quality of customer service.

[0182] An example of a prompt from a generative AI model is, "When a customer appears tired, suggest ways to help them relax and enjoy their shopping experience."

[0183] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0184] Step 1:

[0185] The device collects customer facial and voice data in real time via its camera and microphone. The input is the customer's facial expressions and voice, while the output is a state where these are ready to be sent as digital data to a server. The device acquires data with high precision using sensors.

[0186] Step 2:

[0187] The server receives facial expression and voice data transmitted from the terminal and analyzes it using an emotion recognition model. The input is digital data from the terminal, and the output is the analysis result indicating the customer's emotional state. In this process, TensorFlow is used to extract data features and determine the customer's psychological state.

[0188] Step 3:

[0189] The server generates feedback based on the analysis results and sends it to the salesperson's glasses-type device. The input is the analysis results indicating the customer's emotional state, and the output is specific action suggestions for the salesperson. The feedback is presented visually using the Twilio API, providing the salesperson with appropriate instructions.

[0190] Step 4:

[0191] The user adjusts their communication with the customer based on feedback received through a glasses-type device. The input is feedback information displayed on the glasses, and the output is the salesperson's actions during the actual interaction. The user responds flexibly according to the instructions to improve customer satisfaction.

[0192] Step 5:

[0193] The server collects performance data related to user interactions and uses it to improve feedback for future interactions. The input is data on the results of the interaction, and the output is training data to improve the accuracy of future feedback. This learning process continuously improves the accuracy of the feedback.

[0194] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0195] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0196] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0197] [Second Embodiment]

[0198] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0199] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0200] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0201] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0202] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0203] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0204] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0205] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0206] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0207] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0208] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0209] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0210] This invention is a system for building an AI agent to optimize customer service, aiming to analyze customer emotions and reactions in real time and provide feedback to sales representatives. Specific embodiments for carrying out the invention are shown below.

[0211] First, the terminal collects data on the customer's facial expressions and voice. This is done using cameras and microphones installed in the conference room. The collected data is automatically sent to the server.

[0212] The server uses emotion recognition models and natural language processing models to analyze the received data. For example, it identifies emotional patterns from the customer's facial expressions and tone of voice, and uses this to estimate the customer's current psychological state.

[0213] Based on the analysis results, the server generates feedback and sends it to the terminal. The terminal displays this feedback to the sales representative in real time. For example, if it is highly likely that a customer has questions or concerns, a message such as "It's time to ask questions" will be displayed on the screen to inform the sales representative.

[0214] After the presentation ends, the server performs a detailed analysis of all the collected data. This identifies areas for improvement in the next presentation and provides users with specific suggestions through a generation mechanism. For example, feedback might include, "Strengthening the explanation of technical features can increase customer interest."

[0215] Furthermore, the server uses predictive tools based on past presentation data to proactively suggest effective approaches for specific customers. In this way, sales representatives can prepare their presentations more effectively.

[0216] Embodiments of this invention provide real-time feedback and data-driven insights to streamline sales activities and improve customer satisfaction.

[0217] The following describes the processing flow.

[0218] Step 1:

[0219] As the presentation begins, the device activates the conference room's camera and microphone, continuously collecting customer facial expressions and audio data. The collected data is instantly transmitted to the server.

[0220] Step 2:

[0221] To analyze the data received by the server in real time, emotion recognition models and natural language processing models are used. Specifically, muscle movements and changes in gaze are analyzed from facial expression data, and voice tone and intonation are analyzed from audio data to estimate the customer's emotions and psychological state.

[0222] Step 3:

[0223] Based on the analysis results, the server generates specific feedback that the sales representative should take. For example, if the customer shows a questioning expression, it will generate feedback such as, "You should add an additional explanation on this slide."

[0224] Step 4:

[0225] The terminal displays feedback received from the server on the sales representative's screen in real time. This allows the sales representative to communicate with the customer in a way that is appropriate to their immediate response.

[0226] Step 5:

[0227] After the presentation ends, the server re-analyzes all the collected data in detail and generates suggestions for improvement for the next presentation. For example, it might provide specific advice such as, "Strengthening the explanation on slide 5 will improve comprehension."

[0228] Step 6:

[0229] Users review reports provided by the server and prepare to incorporate them into their next presentation strategy. This improves the quality of sales activities.

[0230] (Example 1)

[0231] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0232] In customer interactions, it is crucial to understand their emotional state in real time and respond appropriately immediately. However, traditional methods require significant time and effort to accurately analyze a customer's emotional state, making real-time feedback difficult. Furthermore, it is difficult to fully utilize past information, making it challenging to take the optimal approach for specific customers.

[0233] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0234] In this invention, the server includes means for acquiring information, means for analyzing emotions in real time based on the acquired information, means for providing interaction to the person in charge based on the judgment result, and means for visually displaying the information on-site. This makes it possible to analyze the emotional state of customers in real time and provide prompt and appropriate feedback. Furthermore, by utilizing prediction means based on past information, it becomes possible to present the optimal approach method for a specific customer in advance.

[0235] "Means of acquiring information" refers to devices and systems that collect unstructured data such as customers' facial expressions and voices.

[0236] "Analysis methods" refer to processes and devices that use emotion recognition and natural language processing technologies to determine emotional states based on collected information.

[0237] "Means of providing interaction" refers to a function that generates and provides feedback to the person in charge to encourage appropriate action based on the analyzed results.

[0238] "Means of visually displaying information on-site" refers to devices and programs that visually display information in real time via terminals and provide visual feedback.

[0239] "Predictive tools" refer to algorithms and technologies that analyze past data and propose the optimal approach for a particular customer in advance.

[0240] This invention is a system for optimizing customer service, which analyzes the customer's emotional state in real time and provides appropriate feedback to the person in charge. Specific embodiments for carrying out this invention are described below.

[0241] The terminal collects data on customers' facial expressions and voices using high-precision devices installed in the conference room, such as high-resolution cameras and high-performance microphones. To give specific product names, this would include general-purpose cameras and microphones. The terminal immediately transmits this unstructured data to the server.

[0242] The server receives the collected data and analyzes the customer's emotional state using emotion recognition models and natural language processing models. Existing technologies such as TensorFlow and OpenCV can be used for this process. Using these technologies, the server identifies the customer's facial expressions and tone of voice, and then identifies their emotional patterns.

[0243] Based on the analyzed data, the server generates feedback to provide interaction with the person in charge. This feedback is customized according to the user's needs and the customer's psychological state. For example, if the customer seems likely to ask a question, a visual alert such as "It's time to ask a question" is generated.

[0244] The terminal displays the generated feedback to the person in charge in real time, prompting a quick response. This can improve customer satisfaction and increase the success rate of business deals.

[0245] Furthermore, users can utilize the predictive capabilities provided by the server. The server analyzes past presentation data to predict and proactively present the most effective approach for a particular customer.

[0246] For example, if the system receives a prompt message instructing the AI ​​model to "analyze customer emotions based on facial expression data and generate specific suggestions for improving sales strategies," the system can provide the user with specific advice on what to do in real time.

[0247] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0248] Step 1:

[0249] The terminal uses its installed camera and microphone to capture the customer's facial expressions and voice with high precision during the meeting. This input data is collected in real time as unstructured data. The terminal then transmits this data to the server.

[0250] Step 2:

[0251] The server receives unstructured data sent from the terminal and performs analysis using emotion recognition models and natural language processing models. The input data is labeled based on the customer's facial features and voice tone. For example, the server uses machine learning algorithms to identify emotional states such as smiles and anxiety for facial analysis. The analysis results output data that shows the customer's psychological state and reactions.

[0252] Step 3:

[0253] The server generates appropriate feedback messages based on the analysis results. Utilizing a generation AI model, it calculates interactions appropriate to the customer's emotional state and creates feedback using prompts and other elements. These prompts include specific content such as, "The customer's interest is increasing; please add more details." The generated feedback message is then sent from the server to the terminal.

[0254] Step 4:

[0255] The terminal visualizes feedback messages received from the server on the employee's display in real time. Sales representatives can then adjust their interactions with customers based on the feedback displayed as a pop-up on their tablet screen, for example. This allows for quick and appropriate responses to customers.

[0256] Step 5:

[0257] The server re-analyzes all data after the meeting to identify areas for improvement in the next presentation. It uses current and past presentation data as input, analyzing it with statistical models and machine learning techniques. For example, it identifies areas where questions are concentrated and topics of interest, outputting information that suggests strategies for the next presentation.

[0258] Step 6:

[0259] Users can receive next-step approach suggestions based on predictions generated by the server. A predictive model combining historical data and current analysis results provides information to proactively prepare effective messages and actions for specific customers. For example, feedback may be output pointing out areas that need further explanation, facilitating preparation for the next presentation.

[0260] (Application Example 1)

[0261] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0262] Traditional in-store customer service presents challenges, such as the difficulty sales staff have in understanding customer emotions and interests in real time, making it difficult to suggest services and products at the appropriate time. Furthermore, the lack of mechanisms to effectively utilize past interaction data to improve future interactions limits the improvement in customer satisfaction.

[0263] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0264] In this invention, the server includes information gathering means for acquiring customer facial expressions and voice information, analysis means for determining the customer's emotional state in real time based on the collected information, and presentation means for providing information to sales staff based on the determination results. This enables sales staff to immediately grasp the customer's emotions and take appropriate action. Furthermore, by utilizing generation means for creating suggestions to improve future customer service, efficient and effective service delivery is realized.

[0265] "Information gathering means" refers to a combination of hardware and software used to acquire customer facial expressions and voice information and to use this information for analysis.

[0266] "Analysis tools" refer to software models and algorithms used to determine a customer's emotional state in real time based on collected information.

[0267] A "presentation means" refers to a device or interface for presenting the judgment results derived from the analysis means to sales staff in real time.

[0268] "Generation means" refers to a system or program that creates suggestions for improving future customer service based on past and present customer information.

[0269] The system implementing this invention collects and analyzes customer facial expressions and voice information in real time, enabling sales staff to respond immediately. The central server of the system integrates multiple means for analyzing the customer's emotional state.

[0270] The server first acquires customer facial expressions and voice data using information gathering tools. This information gathering is done using cameras and microphones installed in the physical store. The collected data is transmitted to the server and stored. As an analysis tool, natural language processing models and emotion recognition models are implemented on the server to immediately analyze the received data and determine the customer's emotional state.

[0271] Furthermore, the server displays the analysis results in real time on the sales staff's smartphones or smart glasses via a display device. This display allows the staff to develop sales strategies that are immediately tailored to the customer's current situation.

[0272] Furthermore, by utilizing data generation methods to analyze past customer interaction data and proposing future customer interaction strategies, we aim to improve the quality of our services.

[0273] For example, if a customer visits a store showing interest in a new product, the server could automatically display feedback on the staff member's device such as, "You seem very interested; please explain the details." This would allow the staff to respond appropriately and increase customer satisfaction.

[0274] An example of a prompt for a generative AI model might be: "If a customer shows interest in a particular product, analyze their emotional state and provide appropriate feedback."

[0275] The flow of the specific process in Application Example 1 will be described using FIG. 12.

[0276] Step 1:

[0277] The terminal collects the facial expressions and voice data of customers in real time using the cameras and microphones in the store. This input includes images of the customers' faces and voice clips. The terminal sends this data to the server.

[0278] Step 2:

[0279] The server processes the received facial expression data and voice data using analysis means. This analysis includes using an emotion recognition model to identify emotion patterns from the customers' facial expressions. Based on the input data, the server determines the emotional state of the customers (e.g., interest, doubt, etc.), and the category of the emotional state is output.

[0280] Step 3:

[0281] The server analyzes the content of the customers' speech using a natural language processing model. In this process, the voice data is converted into text for semantic analysis. Further emotions and intentions are inferred from the tone of the customers' voices and the choice of words, and this information is output in text form.

[0282] Step 4:

[0283] The server generates specific feedback for the sales staff based on the analysis results. Using a generation AI model, the optimal response is calculated from past data, and a message such as "Since the customer shows interest, please start a detailed explanation" is output.

[0284] Step 5:

[0285] The server transmits the generated feedback to the terminal via the presentation means. The terminal displays this feedback in real time on the displays of smartphones or smart glasses worn by sales staff. Also, as an example of a prompt sentence, it includes content such as "If the customer shows interest in a specific product, please analyze their emotional state and provide appropriate feedback."

[0286] Step 6:

[0287] The sales staff, who are the users, receive the feedback from the terminal and take appropriate actions based on it. As a result, a personalized service is provided to the customer, improving satisfaction.

[0288] Furthermore, an emotion engine for estimating the user's emotions may be combined. That is, the specific processing unit 290 may estimate the user's emotions using the emotion identification model 59 and perform specific processing using the user's emotions.

[0289] This invention is a system that analyzes the emotional states of customers and users and provides optimal feedback to sales staff in real time. In particular, by combining an emotion engine, it aims to perform the user's own emotion recognition and optimize two-way communication. The following shows specific forms for implementing the invention.

[0290] The terminal first collects the customer's facial expression data and voice data. This data is acquired using cameras and microphones installed in the meeting room and then transmitted to the server. At the same time, the terminal uses an emotion engine to capture the user's facial expressions and voice tones in order to capture the user's emotional state and generate analysis data.

[0291] The server analyzes received customer and user data using emotion recognition models and natural language processing models. It reads emotions and psychological states from customer data and evaluates the sales representative's own performance from user data. This allows for the generation of feedback based on the emotions of both parties.

[0292] Based on the analysis results, the server generates and sends feedback to the terminal. This feedback provides sales representatives with specific actions to facilitate communication with customers. For example, if a customer appears anxious, the terminal screen will display instructions such as, "Please provide additional information here." Conversely, if the user is judged to be stressed, advice such as, "Take a deep breath and relax," will also be displayed.

[0293] Once the presentation is complete, the server reviews all relevant data again and generates specific improvement suggestions for the next presentation. Users can use this information to improve their presentation skills.

[0294] This invention provides a system that maximizes the effectiveness of sales activities and improves customer and user satisfaction by analyzing the emotions of both customers and users.

[0295] The following describes the processing flow.

[0296] Step 1:

[0297] At the start of the presentation, the device activates the camera and microphone installed in the conference room to collect customer facial and audio data. Simultaneously, it uses an emotion engine to capture the user's facial expressions and voice tone.

[0298] Step 2:

[0299] The server analyzes the customer and user data received in real time from the terminal using an emotion recognition model and a natural language processing model respectively. It evaluates the customer's emotions and psychological state, and also analyzes the user's psychological state and degree of tension.

[0300] Step 3:

[0301] Based on the analysis results, the server generates optimized feedback for each of the customer and the user. If the customer shows interest, it generates an instruction such as "Please explain the details in the next slide", and if the user shows tension, it generates "Advice for relaxation".

[0302] Step 4:

[0303] The terminal receives the feedback from the server and displays it in real time on the screen of the user who is the salesperson. Based on this feedback, the user can dynamically adjust the approach to the customer.

[0304] Step 5:

[0305] After the presentation ends, the server uses all the saved data to generate specific improvement suggestions for the next presentation. For example, suggestions such as "Improving the introduction will make it easier to attract interest" are made.

[0306] Step 6:

[0307] The user reviews the report provided by the server and uses it for preparing the next presentation. Through this process, an improvement in the user's presentation skills and effectiveness can be expected.

[0308] (Example 2)

[0309] Next, Example 2 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0310] To ensure smooth communication during information provision and negotiation processes in business activities, it is crucial to accurately understand the emotional state of buyers and users and provide appropriate feedback. However, conventional technologies have limitations in their ability to analyze the emotions of users and buyers in real time and bidirectionally, and to generate effective feedback and optimization suggestions based on that analysis.

[0311] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0312] In this invention, the server includes information gathering means for analyzing the emotional state of users and buyers, analysis means for evaluating the emotions and psychological state of buyers in real time based on the collected information, and feedback providing means for presenting feedback to users based on the evaluation results and generated analysis data. This enables users to accurately grasp the emotional state of buyers and take appropriate action in real time.

[0313] A "user" is an entity that operates and uses the system to provide information to customers.

[0314] A "buyer" is someone who receives information and is likely to make purchasing decisions regarding companies and products.

[0315] "Emotional state" refers to the psychological state or emotions perceived through an individual's facial expressions and voice.

[0316] "Information gathering means" refers to devices and functions for acquiring data such as facial expressions and voices of users and purchasers.

[0317] "Analysis methods" refer to processes and technologies used to evaluate the emotional state of buyers and users based on collected data.

[0318] A "feedback provision mechanism" is a function that provides users with appropriate actions and improvement suggestions based on the analysis results.

[0319] The "generation method" refers to a function that uses the analysis results to generate improvement suggestions that will be useful for future information provision activities.

[0320] A "generated language processing model" is a data processing technique that has been pre-built for the purpose of analyzing natural language.

[0321] An "emotion recognition model" is an algorithm and technology used to identify an individual's emotions and psychological state from collected data.

[0322] This invention is a system for optimizing communication between users and customers in business activities, providing real-time feedback using emotion recognition technology and language processing technology.

[0323] The terminal uses cameras and microphones installed in the conference room to collect facial and voice data from customers. This data is converted into a digital format and transmitted to a server via a secure communication protocol. The terminal is equipped with an emotion engine that generates analyzable data, including the user's reactions and tone of voice.

[0324] The server applies an emotion recognition model and a generated language processing model to the received data to analyze the buyer's emotional and psychological state. Based on the analysis, it generates specific feedback on how the user should change their approach. This feedback is returned to the device in real time and displayed on the user's screen as appropriate actions and advice.

[0325] For example, if a buyer expresses concern about the information, the server generates and sends feedback such as, "Emphatically emphasize the product's safety here." If the user is nervous, advice such as, "Calm down and speak slowly again," is provided.

[0326] Furthermore, after the meeting concludes, the server generates specific suggestions for improving the quality of the next presentation based on all the analyzed data. This allows users to effectively improve their own performance.

[0327] Examples of prompts include, "What additional information should I provide to alleviate the customer's anxiety?" and "Please tell me some specific ways to calm myself down when I'm feeling nervous."

[0328] This system allows users to facilitate communication with buyers and provide optimal information.

[0329] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0330] Step 1:

[0331] The terminal collects facial and voice data from buyers via cameras and microphones installed in the conference room. This input data is converted into numerical data and stored in a database. Preprocessing, such as noise reduction and feature point extraction for faces and voices, is performed to improve data accuracy. The output is analyzable digital data.

[0332] Step 2:

[0333] The device uses an emotion engine to generate data for analyzing the user's facial expressions and voice tone. This step involves reaction capture and voice tone analysis to quantify the user's emotional state. The input is information obtained from the device's camera and microphone, and the output is numerical data indicating the user's psychological state.

[0334] Step 3:

[0335] The server receives purchaser and user data transmitted from the terminal and analyzes the data using an emotion recognition model and a generated language processing model. The input is numerical data transmitted from Step 1 and Step 2, and the data calculations involve identifying emotional states and evaluating psychological states. The output is the emotion analysis result.

[0336] Step 4:

[0337] The server generates feedback based on the emotional state of the buyer and user, using the analysis results. It utilizes a generative AI model to determine a communication strategy appropriate to the buyer's reaction. The input is the analysis results from step 3, and the output is the generated feedback instructions.

[0338] Step 5:

[0339] The terminal receives feedback sent from the server and displays it to the user in real time. Specifically, it displays feedback messages to help the user respond immediately. The input is the feedback from step 4, and the output is the action item displayed on the user's screen.

[0340] Step 6:

[0341] The server reanalyzes all relevant data after the meeting ends and generates specific improvement suggestions for the next presentation. The input is the entire presentation data, and the data calculations include comparative analysis with past data. The output is the points for improvement that can be used next time.

[0342] (Application Example 2)

[0343] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".

[0344] In modern face-to-face sales, accurately understanding customer emotions and providing appropriate service is essential, but it is a difficult challenge for salespeople to instantly understand customer emotions and take appropriate action. Furthermore, there is a lack of mechanisms for salespeople to receive real-time feedback necessary to improve their customer service skills, making it time-consuming for individuals to improve their skills.

[0345] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0346] In this invention, the server includes data collection means for analyzing the customer's emotional state, analysis means for evaluating the customer's emotions in real time based on the collected data, feedback provision means for providing feedback to the salesperson based on the evaluation results, and presentation means for providing the customer's emotion analysis results as visual feedback to a glasses-type device worn by the salesperson. This enables accurate customer service in line with the customer's emotions and supports the improvement of the salesperson's customer service skills.

[0347] "Customer emotional state" refers to the psychological or emotional state that can be extracted from a customer's facial expressions, tone of voice, choice of words, and other factors.

[0348] "Data collection methods" refer to devices and technologies used to collect information necessary to understand the emotional state of customers.

[0349] "Analysis means" refers to computational or processing techniques used to evaluate customer emotions based on collected data.

[0350] "Feedback provision means" refers to devices or systems that provide information to sales staff based on analysis results to encourage appropriate responses and actions.

[0351] "Generation means" refers to devices or technologies that have the function of generating suggestions for improving future face-to-face sales activities based on data acquired by analysis means.

[0352] "Presentation means" refers to technology or equipment for visually displaying analysis results on a device worn by a salesperson.

[0353] A "glasses-type device" refers to an electronic device in the form of glasses that salespeople wear, allowing them to visually confirm analysis results and feedback.

[0354] The system for implementing this invention aims to analyze customer emotions and provide real-time feedback to sales staff.

[0355] The server first receives facial expression and voice data from customers collected using cameras and microphones. This data is acquired in the initial stages by devices such as smart glasses and smartphones. The server uses this data to analyze the customer's emotional state using emotion recognition technology powered by TensorFlow and Google Cloud's natural language processing APIs.

[0356] The analysis results are generated as real-time feedback and sent immediately to a glasses-type device worn by the salesperson, for example, using the Twilio API. The feedback provided includes information that allows the salesperson to instantly adjust their actions. For example, if the customer appears confused, instructions such as "Encourage questions in a calm tone" might appear on the glasses.

[0357] As a concrete example, let's consider a clothing store setting. When a customer makes a troubled expression while trying on clothes, the system analyzes that expression and provides feedback to the sales staff, such as, "Please ask about the fit of the size." This can improve the quality of customer service.

[0358] An example of a prompt from a generative AI model is, "When a customer appears tired, suggest ways to help them relax and enjoy their shopping experience."

[0359] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0360] Step 1:

[0361] The device collects customer facial and voice data in real time via its camera and microphone. The input is the customer's facial expressions and voice, while the output is a state where these are ready to be sent as digital data to a server. The device acquires data with high precision using sensors.

[0362] Step 2:

[0363] The server receives facial expression and voice data transmitted from the terminal and analyzes it using an emotion recognition model. The input is digital data from the terminal, and the output is the analysis result indicating the customer's emotional state. In this process, TensorFlow is used to extract data features and determine the customer's psychological state.

[0364] Step 3:

[0365] The server generates feedback based on the analysis results and sends it to the salesperson's glasses-type device. The input is the analysis results indicating the customer's emotional state, and the output is specific action suggestions for the salesperson. The feedback is presented visually using the Twilio API, providing the salesperson with appropriate instructions.

[0366] Step 4:

[0367] The user adjusts their communication with the customer based on feedback received through a glasses-type device. The input is feedback information displayed on the glasses, and the output is the salesperson's actions during the actual interaction. The user responds flexibly according to the instructions to improve customer satisfaction.

[0368] Step 5:

[0369] The server collects performance data related to user interactions and uses it to improve feedback for future interactions. The input is data on the results of the interaction, and the output is training data to improve the accuracy of future feedback. This learning process continuously improves the accuracy of the feedback.

[0370] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0371] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0372] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0373] [Third Embodiment]

[0374] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0375] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0376] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0377] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0378] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0379] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0380] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0381] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0382] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0383] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0384] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0385] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0386] This invention is a system for building an AI agent to optimize customer service, aiming to analyze customer emotions and reactions in real time and provide feedback to sales representatives. Specific embodiments for carrying out the invention are shown below.

[0387] First, the terminal collects data on the customer's facial expressions and voice. This is done using cameras and microphones installed in the conference room. The collected data is automatically sent to the server.

[0388] The server uses emotion recognition models and natural language processing models to analyze the received data. For example, it identifies emotional patterns from the customer's facial expressions and tone of voice, and uses this to estimate the customer's current psychological state.

[0389] Based on the analysis results, the server generates feedback and sends it to the terminal. The terminal displays this feedback to the sales representative in real time. For example, if it is highly likely that a customer has questions or concerns, a message such as "It's time to ask questions" will be displayed on the screen to inform the sales representative.

[0390] After the presentation ends, the server performs a detailed analysis of all the collected data. This identifies areas for improvement in the next presentation and provides users with specific suggestions through a generation mechanism. For example, feedback might include, "Strengthening the explanation of technical features can increase customer interest."

[0391] Furthermore, the server uses predictive tools based on past presentation data to proactively suggest effective approaches for specific customers. In this way, sales representatives can prepare their presentations more effectively.

[0392] Embodiments of this invention provide real-time feedback and data-driven insights to streamline sales activities and improve customer satisfaction.

[0393] The following describes the processing flow.

[0394] Step 1:

[0395] As the presentation begins, the device activates the conference room's camera and microphone, continuously collecting customer facial expressions and audio data. The collected data is instantly transmitted to the server.

[0396] Step 2:

[0397] To analyze the data received by the server in real time, emotion recognition models and natural language processing models are used. Specifically, muscle movements and changes in gaze are analyzed from facial expression data, and voice tone and intonation are analyzed from audio data to estimate the customer's emotions and psychological state.

[0398] Step 3:

[0399] Based on the analysis results, the server generates specific feedback that the sales representative should take. For example, if the customer shows a questioning expression, it will generate feedback such as, "You should add an additional explanation on this slide."

[0400] Step 4:

[0401] The terminal displays feedback received from the server on the sales representative's screen in real time. This allows the sales representative to communicate with the customer in a way that is appropriate to their immediate response.

[0402] Step 5:

[0403] After the presentation ends, the server re-analyzes all the collected data in detail and generates suggestions for improvement for the next presentation. For example, it might provide specific advice such as, "Strengthening the explanation on slide 5 will improve comprehension."

[0404] Step 6:

[0405] Users review reports provided by the server and prepare to incorporate them into their next presentation strategy. This improves the quality of sales activities.

[0406] (Example 1)

[0407] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0408] In customer interactions, it is crucial to understand their emotional state in real time and respond appropriately immediately. However, traditional methods require significant time and effort to accurately analyze a customer's emotional state, making real-time feedback difficult. Furthermore, it is difficult to fully utilize past information, making it challenging to take the optimal approach for specific customers.

[0409] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0410] In this invention, the server includes means for acquiring information, means for analyzing emotions in real time based on the acquired information, means for providing interaction to the person in charge based on the judgment result, and means for visually displaying the information on-site. This makes it possible to analyze the emotional state of customers in real time and provide prompt and appropriate feedback. Furthermore, by utilizing prediction means based on past information, it becomes possible to present the optimal approach method for a specific customer in advance.

[0411] "Means of acquiring information" refers to devices and systems that collect unstructured data such as customers' facial expressions and voices.

[0412] "Analysis methods" refer to processes and devices that use emotion recognition and natural language processing technologies to determine emotional states based on collected information.

[0413] "Means of providing interaction" refers to a function that generates and provides feedback to the person in charge to encourage appropriate action based on the analyzed results.

[0414] "Means of visually displaying information on-site" refers to devices and programs that visually display information in real time via terminals and provide visual feedback.

[0415] "Predictive tools" refer to algorithms and technologies that analyze past data and propose the optimal approach for a particular customer in advance.

[0416] This invention is a system for optimizing customer service, which analyzes the customer's emotional state in real time and provides appropriate feedback to the person in charge. Specific embodiments for carrying out this invention are described below.

[0417] The terminal collects data on customers' facial expressions and voices using high-precision devices installed in the conference room, such as high-resolution cameras and high-performance microphones. To give specific product names, this would include general-purpose cameras and microphones. The terminal immediately transmits this unstructured data to the server.

[0418] The server receives the collected data and analyzes the customer's emotional state using emotion recognition models and natural language processing models. Existing technologies such as TensorFlow and OpenCV can be used for this process. Using these technologies, the server identifies the customer's facial expressions and tone of voice, and then identifies their emotional patterns.

[0419] Based on the analyzed data, the server generates feedback to provide interaction with the person in charge. This feedback is customized according to the user's needs and the customer's psychological state. For example, if the customer seems likely to ask a question, a visual alert such as "It's time to ask a question" is generated.

[0420] The terminal displays the generated feedback to the person in charge in real time, prompting a quick response. This can improve customer satisfaction and increase the success rate of business deals.

[0421] Furthermore, users can utilize the predictive capabilities provided by the server. The server analyzes past presentation data to predict and proactively present the most effective approach for a particular customer.

[0422] For example, if the system receives a prompt message instructing the AI ​​model to "analyze customer emotions based on facial expression data and generate specific suggestions for improving sales strategies," the system can provide the user with specific advice on what to do in real time.

[0423] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0424] Step 1:

[0425] The terminal uses its installed camera and microphone to capture the customer's facial expressions and voice with high precision during the meeting. This input data is collected in real time as unstructured data. The terminal then transmits this data to the server.

[0426] Step 2:

[0427] The server receives unstructured data sent from the terminal and performs analysis using emotion recognition models and natural language processing models. The input data is labeled based on the customer's facial features and voice tone. For example, the server uses machine learning algorithms to identify emotional states such as smiles and anxiety for facial analysis. The analysis results output data that shows the customer's psychological state and reactions.

[0428] Step 3:

[0429] The server generates appropriate feedback messages based on the analysis results. Utilizing a generation AI model, it calculates interactions appropriate to the customer's emotional state and creates feedback using prompts and other elements. These prompts include specific content such as, "The customer's interest is increasing; please add more details." The generated feedback message is then sent from the server to the terminal.

[0430] Step 4:

[0431] The terminal visualizes feedback messages received from the server on the employee's display in real time. Sales representatives can then adjust their interactions with customers based on the feedback displayed as a pop-up on their tablet screen, for example. This allows for quick and appropriate responses to customers.

[0432] Step 5:

[0433] The server re-analyzes all data after the meeting to identify areas for improvement in the next presentation. It uses current and past presentation data as input, analyzing it with statistical models and machine learning techniques. For example, it identifies areas where questions are concentrated and topics of interest, outputting information that suggests strategies for the next presentation.

[0434] Step 6:

[0435] Users can receive next-step approach suggestions based on predictions generated by the server. A predictive model combining historical data and current analysis results provides information to proactively prepare effective messages and actions for specific customers. For example, feedback may be output pointing out areas that need further explanation, facilitating preparation for the next presentation.

[0436] (Application Example 1)

[0437] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0438] Traditional in-store customer service presents challenges, such as the difficulty sales staff have in understanding customer emotions and interests in real time, making it difficult to suggest services and products at the appropriate time. Furthermore, the lack of mechanisms to effectively utilize past interaction data to improve future interactions limits the improvement in customer satisfaction.

[0439] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0440] In this invention, the server includes information gathering means for acquiring customer facial expressions and voice information, analysis means for determining the customer's emotional state in real time based on the collected information, and presentation means for providing information to sales staff based on the determination results. This enables sales staff to immediately grasp the customer's emotions and take appropriate action. Furthermore, by utilizing generation means for creating suggestions to improve future customer service, efficient and effective service delivery is realized.

[0441] "Information gathering means" refers to a combination of hardware and software used to acquire customer facial expressions and voice information and to use this information for analysis.

[0442] "Analysis tools" refer to software models and algorithms used to determine a customer's emotional state in real time based on collected information.

[0443] A "presentation means" refers to a device or interface for presenting the judgment results derived from the analysis means to sales staff in real time.

[0444] "Generation means" refers to a system or program that creates suggestions for improving future customer service based on past and present customer information.

[0445] The system implementing this invention collects and analyzes customer facial expressions and voice information in real time, enabling sales staff to respond immediately. The central server of the system integrates multiple means for analyzing the customer's emotional state.

[0446] The server first acquires customer facial expressions and voice data using information gathering tools. This information gathering is done using cameras and microphones installed in the physical store. The collected data is transmitted to the server and stored. As an analysis tool, natural language processing models and emotion recognition models are implemented on the server to immediately analyze the received data and determine the customer's emotional state.

[0447] Furthermore, the server displays the analysis results in real time on the sales staff's smartphones or smart glasses via a display device. This display allows the staff to develop sales strategies that are immediately tailored to the customer's current situation.

[0448] Furthermore, by utilizing data generation methods to analyze past customer interaction data and proposing future customer interaction strategies, we aim to improve the quality of our services.

[0449] For example, if a customer visits a store showing interest in a new product, the server could automatically display feedback on the staff member's device such as, "You seem very interested; please explain the details." This would allow the staff to respond appropriately and increase customer satisfaction.

[0450] An example of a prompt for a generative AI model might be: "If a customer shows interest in a particular product, analyze their emotional state and provide appropriate feedback."

[0451] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0452] Step 1:

[0453] The terminal uses the store's cameras and microphones to collect customer facial expressions and audio data in real time. This input includes images of the customer's face and audio clips. The terminal sends this data to a server.

[0454] Step 2:

[0455] The server processes the received facial expression and audio data using analysis tools. This analysis includes using an emotion recognition model to identify emotional patterns from the customer's facial expressions. Based on the input data, the server determines the customer's emotional state (e.g., interest, suspicion) and outputs an emotional state category.

[0456] Step 3:

[0457] The server uses a natural language processing model to analyze what the customer is saying. This process involves converting the audio data into text, followed by semantic analysis. Further emotions and intentions are inferred from the customer's tone of voice and word choices, and this information is output in text format.

[0458] Step 4:

[0459] The server generates specific feedback for sales staff based on the analysis results. Using a generative AI model, it calculates the optimal response from past data and outputs messages such as, "The customer is showing interest, please begin providing a detailed explanation."

[0460] Step 5:

[0461] The server sends the generated feedback to the terminal via a presentation device. The terminal displays this feedback in real time on the smartphone or smart glasses worn by the sales staff. Examples of prompt messages include: "If the customer shows interest in a particular product, analyze their emotional state and provide appropriate feedback."

[0462] Step 6:

[0463] Sales staff, who are also users of the device, receive feedback from the terminal and take appropriate action based on that feedback. This allows for personalized service to be provided to customers, improving their satisfaction.

[0464] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0465] This invention is a system that analyzes the emotional state of customers and users and provides optimal real-time feedback to sales representatives. In particular, by combining it with an emotion engine, it aims to recognize the user's own emotions and optimize two-way communication. The following describes specific embodiments for carrying out the invention.

[0466] The terminal first collects customer facial and voice data. This data is acquired using cameras and microphones installed in the conference room and then transmitted to a server. Simultaneously, the terminal uses an emotion engine to capture the user's facial expressions and voice tone to understand their emotional state and generate data for analysis.

[0467] The server analyzes received customer and user data using emotion recognition models and natural language processing models. It reads emotions and psychological states from customer data and evaluates the sales representative's own performance from user data. This allows for the generation of feedback based on the emotions of both parties.

[0468] Based on the analysis results, the server generates and sends feedback to the terminal. This feedback provides sales representatives with specific actions to facilitate communication with customers. For example, if a customer appears anxious, the terminal screen will display instructions such as, "Please provide additional information here." Conversely, if the user is judged to be stressed, advice such as, "Take a deep breath and relax," will also be displayed.

[0469] Once the presentation is complete, the server reviews all relevant data again and generates specific improvement suggestions for the next presentation. Users can use this information to improve their presentation skills.

[0470] This invention provides a system that maximizes the effectiveness of sales activities and improves customer and user satisfaction by analyzing the emotions of both customers and users.

[0471] The following describes the processing flow.

[0472] Step 1:

[0473] At the start of the presentation, the device activates the camera and microphone installed in the conference room to collect customer facial and audio data. Simultaneously, it uses an emotion engine to capture the user's facial expressions and voice tone.

[0474] Step 2:

[0475] The server analyzes customer and user data received in real time from terminals using emotion recognition models and natural language processing models, respectively. It evaluates the customer's emotions and psychological state, as well as the user's psychological state and level of tension.

[0476] Step 3:

[0477] Based on the analysis results, the server generates feedback optimized for each customer and user. If the customer shows interest, it will generate instructions such as "Please explain in more detail on the next slide," and if the user shows signs of tension, it will generate "advice to help you relax."

[0478] Step 4:

[0479] The terminal receives feedback from the server and displays it in real time on the user's screen (the sales representative). Based on this feedback, the user can dynamically adjust their approach to customers.

[0480] Step 5:

[0481] After the presentation ends, the server uses all the saved data to generate specific improvement suggestions for the next presentation. For example, it might suggest, "Improving the introduction will make it easier to capture the audience's attention."

[0482] Step 6:

[0483] Users review reports provided by the server and use them to prepare for their next presentation. This process is expected to improve users' presentation skills and effectiveness.

[0484] (Example 2)

[0485] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0486] To ensure smooth communication during information provision and negotiation processes in business activities, it is crucial to accurately understand the emotional state of buyers and users and provide appropriate feedback. However, conventional technologies have limitations in their ability to analyze the emotions of users and buyers in real time and bidirectionally, and to generate effective feedback and optimization suggestions based on that analysis.

[0487] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0488] In this invention, the server includes information gathering means for analyzing the emotional state of users and buyers, analysis means for evaluating the emotions and psychological state of buyers in real time based on the collected information, and feedback providing means for presenting feedback to users based on the evaluation results and generated analysis data. This enables users to accurately grasp the emotional state of buyers and take appropriate action in real time.

[0489] A "user" is an entity that operates and uses the system to provide information to customers.

[0490] A "buyer" is someone who receives information and is likely to make purchasing decisions regarding companies and products.

[0491] "Emotional state" refers to the psychological state or emotions perceived through an individual's facial expressions and voice.

[0492] "Information gathering means" refers to devices and functions for acquiring data such as facial expressions and voices of users and purchasers.

[0493] "Analysis methods" refer to processes and technologies used to evaluate the emotional state of buyers and users based on collected data.

[0494] A "feedback provision mechanism" is a function that provides users with appropriate actions and improvement suggestions based on the analysis results.

[0495] The "generation method" refers to a function that uses the analysis results to generate improvement suggestions that will be useful for future information provision activities.

[0496] A "generated language processing model" is a data processing technique that has been pre-built for the purpose of analyzing natural language.

[0497] An "emotion recognition model" is an algorithm and technology used to identify an individual's emotions and psychological state from collected data.

[0498] This invention is a system for optimizing communication between users and customers in business activities, providing real-time feedback using emotion recognition technology and language processing technology.

[0499] The terminal uses cameras and microphones installed in the conference room to collect facial and voice data from customers. This data is converted into a digital format and transmitted to a server via a secure communication protocol. The terminal is equipped with an emotion engine that generates analyzable data, including the user's reactions and tone of voice.

[0500] The server applies an emotion recognition model and a generated language processing model to the received data to analyze the buyer's emotional and psychological state. Based on the analysis, it generates specific feedback on how the user should change their approach. This feedback is returned to the device in real time and displayed on the user's screen as appropriate actions and advice.

[0501] For example, if a buyer expresses concern about the information, the server generates and sends feedback such as, "Emphatically emphasize the product's safety here." If the user is nervous, advice such as, "Calm down and speak slowly again," is provided.

[0502] Furthermore, after the meeting concludes, the server generates specific suggestions for improving the quality of the next presentation based on all the analyzed data. This allows users to effectively improve their own performance.

[0503] Examples of prompts include, "What additional information should I provide to alleviate the customer's anxiety?" and "Please tell me some specific ways to calm myself down when I'm feeling nervous."

[0504] This system allows users to facilitate communication with buyers and provide optimal information.

[0505] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0506] Step 1:

[0507] The terminal collects facial and voice data from buyers via cameras and microphones installed in the conference room. This input data is converted into numerical data and stored in a database. Preprocessing, such as noise reduction and feature point extraction for faces and voices, is performed to improve data accuracy. The output is analyzable digital data.

[0508] Step 2:

[0509] The device uses an emotion engine to generate data for analyzing the user's facial expressions and voice tone. This step involves reaction capture and voice tone analysis to quantify the user's emotional state. The input is information obtained from the device's camera and microphone, and the output is numerical data indicating the user's psychological state.

[0510] Step 3:

[0511] The server receives purchaser and user data transmitted from the terminal and analyzes the data using an emotion recognition model and a generated language processing model. The input is numerical data transmitted from Step 1 and Step 2, and the data calculations involve identifying emotional states and evaluating psychological states. The output is the emotion analysis result.

[0512] Step 4:

[0513] The server generates feedback based on the emotional state of the buyer and user, using the analysis results. It utilizes a generative AI model to determine a communication strategy appropriate to the buyer's reaction. The input is the analysis results from step 3, and the output is the generated feedback instructions.

[0514] Step 5:

[0515] The terminal receives feedback sent from the server and displays it to the user in real time. Specifically, it displays feedback messages to help the user respond immediately. The input is the feedback from step 4, and the output is the action item displayed on the user's screen.

[0516] Step 6:

[0517] The server reanalyzes all relevant data after the meeting ends and generates specific improvement suggestions for the next presentation. The input is the entire presentation data, and the data calculations include comparative analysis with past data. The output is the points for improvement that can be used next time.

[0518] (Application Example 2)

[0519] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0520] In modern face-to-face sales, accurately understanding customer emotions and providing appropriate service is essential, but it is a difficult challenge for salespeople to instantly understand customer emotions and take appropriate action. Furthermore, there is a lack of mechanisms for salespeople to receive real-time feedback necessary to improve their customer service skills, making it time-consuming for individuals to improve their skills.

[0521] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0522] In this invention, the server includes data collection means for analyzing the customer's emotional state, analysis means for evaluating the customer's emotions in real time based on the collected data, feedback provision means for providing feedback to the salesperson based on the evaluation results, and presentation means for providing the customer's emotion analysis results as visual feedback to a glasses-type device worn by the salesperson. This enables accurate customer service in line with the customer's emotions and supports the improvement of the salesperson's customer service skills.

[0523] "Customer emotional state" refers to the psychological or emotional state that can be extracted from a customer's facial expressions, tone of voice, choice of words, and other factors.

[0524] "Data collection methods" refer to devices and technologies used to collect information necessary to understand the emotional state of customers.

[0525] "Analysis means" refers to computational or processing techniques used to evaluate customer emotions based on collected data.

[0526] "Feedback provision means" refers to devices or systems that provide information to sales staff based on analysis results to encourage appropriate responses and actions.

[0527] "Generation means" refers to devices or technologies that have the function of generating suggestions for improving future face-to-face sales activities based on data acquired by analysis means.

[0528] "Presentation means" refers to technology or equipment for visually displaying analysis results on a device worn by a salesperson.

[0529] A "glasses-type device" refers to an electronic device in the form of glasses that salespeople wear, allowing them to visually confirm analysis results and feedback.

[0530] The system for implementing this invention aims to analyze customer emotions and provide real-time feedback to sales staff.

[0531] The server first receives facial expression and voice data from customers collected using cameras and microphones. This data is acquired in the initial stages by devices such as smart glasses and smartphones. The server uses this data to analyze the customer's emotional state using emotion recognition technology powered by TensorFlow and Google Cloud's natural language processing APIs.

[0532] The analysis results are generated as real-time feedback and sent immediately to a glasses-type device worn by the salesperson, for example, using the Twilio API. The feedback provided includes information that allows the salesperson to instantly adjust their actions. For example, if the customer appears confused, instructions such as "Encourage questions in a calm tone" might appear on the glasses.

[0533] As a concrete example, let's consider a clothing store setting. When a customer makes a troubled expression while trying on clothes, the system analyzes that expression and provides feedback to the sales staff, such as, "Please ask about the fit of the size." This can improve the quality of customer service.

[0534] An example of a prompt from a generative AI model is, "When a customer appears tired, suggest ways to help them relax and enjoy their shopping experience."

[0535] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0536] Step 1:

[0537] The device collects customer facial and voice data in real time via its camera and microphone. The input is the customer's facial expressions and voice, while the output is a state where these are ready to be sent as digital data to a server. The device acquires data with high precision using sensors.

[0538] Step 2:

[0539] The server receives facial expression and voice data transmitted from the terminal and analyzes it using an emotion recognition model. The input is digital data from the terminal, and the output is the analysis result indicating the customer's emotional state. In this process, TensorFlow is used to extract data features and determine the customer's psychological state.

[0540] Step 3:

[0541] The server generates feedback based on the analysis results and sends it to the salesperson's glasses-type device. The input is the analysis results indicating the customer's emotional state, and the output is specific action suggestions for the salesperson. The feedback is presented visually using the Twilio API, providing the salesperson with appropriate instructions.

[0542] Step 4:

[0543] The user adjusts their communication with the customer based on feedback received through a glasses-type device. The input is feedback information displayed on the glasses, and the output is the salesperson's actions during the actual interaction. The user responds flexibly according to the instructions to improve customer satisfaction.

[0544] Step 5:

[0545] The server collects performance data related to user interactions and uses it to improve feedback for future interactions. The input is data on the results of the interaction, and the output is training data to improve the accuracy of future feedback. This learning process continuously improves the accuracy of the feedback.

[0546] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0547] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0548] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0549] [Fourth Embodiment]

[0550] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0551] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0552] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0553] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0554] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0555] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0556] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0557] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0558] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0559] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0560] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0561] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0562] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0563] This invention is a system for building an AI agent to optimize customer service, aiming to analyze customer emotions and reactions in real time and provide feedback to sales representatives. Specific embodiments for carrying out the invention are shown below.

[0564] First, the terminal collects data on the customer's facial expressions and voice. This is done using cameras and microphones installed in the conference room. The collected data is automatically sent to the server.

[0565] The server uses emotion recognition models and natural language processing models to analyze the received data. For example, it identifies emotional patterns from the customer's facial expressions and tone of voice, and uses this to estimate the customer's current psychological state.

[0566] Based on the analysis results, the server generates feedback and sends it to the terminal. The terminal displays this feedback to the sales representative in real time. For example, if it is highly likely that a customer has questions or concerns, a message such as "It's time to ask questions" will be displayed on the screen to inform the sales representative.

[0567] After the presentation ends, the server performs a detailed analysis of all the collected data. This identifies areas for improvement in the next presentation and provides users with specific suggestions through a generation mechanism. For example, feedback might include, "Strengthening the explanation of technical features can increase customer interest."

[0568] Furthermore, the server uses predictive tools based on past presentation data to proactively suggest effective approaches for specific customers. In this way, sales representatives can prepare their presentations more effectively.

[0569] Embodiments of this invention provide real-time feedback and data-driven insights to streamline sales activities and improve customer satisfaction.

[0570] The following describes the processing flow.

[0571] Step 1:

[0572] As the presentation begins, the device activates the conference room's camera and microphone, continuously collecting customer facial expressions and audio data. The collected data is instantly transmitted to the server.

[0573] Step 2:

[0574] To analyze the data received by the server in real time, emotion recognition models and natural language processing models are used. Specifically, muscle movements and changes in gaze are analyzed from facial expression data, and voice tone and intonation are analyzed from audio data to estimate the customer's emotions and psychological state.

[0575] Step 3:

[0576] Based on the analysis results, the server generates specific feedback that the sales representative should take. For example, if the customer shows a questioning expression, it will generate feedback such as, "You should add an additional explanation on this slide."

[0577] Step 4:

[0578] The terminal displays feedback received from the server on the sales representative's screen in real time. This allows the sales representative to communicate with the customer in a way that is appropriate to their immediate response.

[0579] Step 5:

[0580] After the presentation ends, the server re-analyzes all the collected data in detail and generates suggestions for improvement for the next presentation. For example, it might provide specific advice such as, "Strengthening the explanation on slide 5 will improve comprehension."

[0581] Step 6:

[0582] Users review reports provided by the server and prepare to incorporate them into their next presentation strategy. This improves the quality of sales activities.

[0583] (Example 1)

[0584] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0585] In customer interactions, it is crucial to understand their emotional state in real time and respond appropriately immediately. However, traditional methods require significant time and effort to accurately analyze a customer's emotional state, making real-time feedback difficult. Furthermore, it is difficult to fully utilize past information, making it challenging to take the optimal approach for specific customers.

[0586] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0587] In this invention, the server includes means for acquiring information, means for analyzing emotions in real time based on the acquired information, means for providing interaction to the person in charge based on the judgment result, and means for visually displaying the information on-site. This makes it possible to analyze the emotional state of customers in real time and provide prompt and appropriate feedback. Furthermore, by utilizing prediction means based on past information, it becomes possible to present the optimal approach method for a specific customer in advance.

[0588] "Means of acquiring information" refers to devices and systems that collect unstructured data such as customers' facial expressions and voices.

[0589] "Analysis methods" refer to processes and devices that use emotion recognition and natural language processing technologies to determine emotional states based on collected information.

[0590] "Means of providing interaction" refers to a function that generates and provides feedback to the person in charge to encourage appropriate action based on the analyzed results.

[0591] "Means of visually displaying information on-site" refers to devices and programs that visually display information in real time via terminals and provide visual feedback.

[0592] "Predictive tools" refer to algorithms and technologies that analyze past data and propose the optimal approach for a particular customer in advance.

[0593] This invention is a system for optimizing customer service, which analyzes the customer's emotional state in real time and provides appropriate feedback to the person in charge. Specific embodiments for carrying out this invention are described below.

[0594] The terminal collects data on customers' facial expressions and voices using high-precision devices installed in the conference room, such as high-resolution cameras and high-performance microphones. To give specific product names, this would include general-purpose cameras and microphones. The terminal immediately transmits this unstructured data to the server.

[0595] The server receives the collected data and analyzes the customer's emotional state using emotion recognition models and natural language processing models. Existing technologies such as TensorFlow and OpenCV can be used for this process. Using these technologies, the server identifies the customer's facial expressions and tone of voice, and then identifies their emotional patterns.

[0596] Based on the analyzed data, the server generates feedback to provide interaction with the person in charge. This feedback is customized according to the user's needs and the customer's psychological state. For example, if the customer seems likely to ask a question, a visual alert such as "It's time to ask a question" is generated.

[0597] The terminal displays the generated feedback to the person in charge in real time, prompting a quick response. This can improve customer satisfaction and increase the success rate of business deals.

[0598] Furthermore, users can utilize the predictive capabilities provided by the server. The server analyzes past presentation data to predict and proactively present the most effective approach for a particular customer.

[0599] For example, if the system receives a prompt message instructing the AI ​​model to "analyze customer emotions based on facial expression data and generate specific suggestions for improving sales strategies," the system can provide the user with specific advice on what to do in real time.

[0600] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0601] Step 1:

[0602] The terminal uses its installed camera and microphone to capture the customer's facial expressions and voice with high precision during the meeting. This input data is collected in real time as unstructured data. The terminal then transmits this data to the server.

[0603] Step 2:

[0604] The server receives unstructured data sent from the terminal and performs analysis using emotion recognition models and natural language processing models. The input data is labeled based on the customer's facial features and voice tone. For example, the server uses machine learning algorithms to identify emotional states such as smiles and anxiety for facial analysis. The analysis results output data that shows the customer's psychological state and reactions.

[0605] Step 3:

[0606] The server generates appropriate feedback messages based on the analysis results. Utilizing a generation AI model, it calculates interactions appropriate to the customer's emotional state and creates feedback using prompts and other elements. These prompts include specific content such as, "The customer's interest is increasing; please add more details." The generated feedback message is then sent from the server to the terminal.

[0607] Step 4:

[0608] The terminal visualizes feedback messages received from the server on the employee's display in real time. Sales representatives can then adjust their interactions with customers based on the feedback displayed as a pop-up on their tablet screen, for example. This allows for quick and appropriate responses to customers.

[0609] Step 5:

[0610] The server re-analyzes all data after the meeting to identify areas for improvement in the next presentation. It uses current and past presentation data as input, analyzing it with statistical models and machine learning techniques. For example, it identifies areas where questions are concentrated and topics of interest, outputting information that suggests strategies for the next presentation.

[0611] Step 6:

[0612] Users can receive next-step approach suggestions based on predictions generated by the server. A predictive model combining historical data and current analysis results provides information to proactively prepare effective messages and actions for specific customers. For example, feedback may be output pointing out areas that need further explanation, facilitating preparation for the next presentation.

[0613] (Application Example 1)

[0614] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0615] Traditional in-store customer service presents challenges, such as the difficulty sales staff have in understanding customer emotions and interests in real time, making it difficult to suggest services and products at the appropriate time. Furthermore, the lack of mechanisms to effectively utilize past interaction data to improve future interactions limits the improvement in customer satisfaction.

[0616] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0617] In this invention, the server includes information gathering means for acquiring customer facial expressions and voice information, analysis means for determining the customer's emotional state in real time based on the collected information, and presentation means for providing information to sales staff based on the determination results. This enables sales staff to immediately grasp the customer's emotions and take appropriate action. Furthermore, by utilizing generation means for creating suggestions to improve future customer service, efficient and effective service delivery is realized.

[0618] "Information gathering means" refers to a combination of hardware and software used to acquire customer facial expressions and voice information and to use this information for analysis.

[0619] "Analysis tools" refer to software models and algorithms used to determine a customer's emotional state in real time based on collected information.

[0620] A "presentation means" refers to a device or interface for presenting the judgment results derived from the analysis means to sales staff in real time.

[0621] "Generation means" refers to a system or program that creates suggestions for improving future customer service based on past and present customer information.

[0622] The system implementing this invention collects and analyzes customer facial expressions and voice information in real time, enabling sales staff to respond immediately. The central server of the system integrates multiple means for analyzing the customer's emotional state.

[0623] The server first acquires customer facial expressions and voice data using information gathering tools. This information gathering is done using cameras and microphones installed in the physical store. The collected data is transmitted to the server and stored. As an analysis tool, natural language processing models and emotion recognition models are implemented on the server to immediately analyze the received data and determine the customer's emotional state.

[0624] Furthermore, the server displays the analysis results in real time on the sales staff's smartphones or smart glasses via a display device. This display allows the staff to develop sales strategies that are immediately tailored to the customer's current situation.

[0625] Furthermore, by utilizing data generation methods to analyze past customer interaction data and proposing future customer interaction strategies, we aim to improve the quality of our services.

[0626] For example, if a customer visits a store showing interest in a new product, the server could automatically display feedback on the staff member's device such as, "You seem very interested; please explain the details." This would allow the staff to respond appropriately and increase customer satisfaction.

[0627] An example of a prompt for a generative AI model might be: "If a customer shows interest in a particular product, analyze their emotional state and provide appropriate feedback."

[0628] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0629] Step 1:

[0630] The terminal uses the store's cameras and microphones to collect customer facial expressions and audio data in real time. This input includes images of the customer's face and audio clips. The terminal sends this data to a server.

[0631] Step 2:

[0632] The server processes the received facial expression and audio data using analysis tools. This analysis includes using an emotion recognition model to identify emotional patterns from the customer's facial expressions. Based on the input data, the server determines the customer's emotional state (e.g., interest, suspicion) and outputs an emotional state category.

[0633] Step 3:

[0634] The server uses a natural language processing model to analyze what the customer is saying. This process involves converting the audio data into text, followed by semantic analysis. Further emotions and intentions are inferred from the customer's tone of voice and word choices, and this information is output in text format.

[0635] Step 4:

[0636] The server generates specific feedback for sales staff based on the analysis results. Using a generative AI model, it calculates the optimal response from past data and outputs messages such as, "The customer is showing interest, please begin providing a detailed explanation."

[0637] Step 5:

[0638] The server sends the generated feedback to the terminal via a presentation device. The terminal displays this feedback in real time on the smartphone or smart glasses worn by the sales staff. Examples of prompt messages include: "If the customer shows interest in a particular product, analyze their emotional state and provide appropriate feedback."

[0639] Step 6:

[0640] Sales staff, who are also users of the device, receive feedback from the terminal and take appropriate action based on that feedback. This allows for personalized service to be provided to customers, improving their satisfaction.

[0641] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0642] This invention is a system that analyzes the emotional state of customers and users and provides optimal real-time feedback to sales representatives. In particular, by combining it with an emotion engine, it aims to recognize the user's own emotions and optimize two-way communication. The following describes specific embodiments for carrying out the invention.

[0643] The terminal first collects customer facial and voice data. This data is acquired using cameras and microphones installed in the conference room and then transmitted to a server. Simultaneously, the terminal uses an emotion engine to capture the user's facial expressions and voice tone to understand their emotional state and generate data for analysis.

[0644] The server analyzes received customer and user data using emotion recognition models and natural language processing models. It reads emotions and psychological states from customer data and evaluates the sales representative's own performance from user data. This allows for the generation of feedback based on the emotions of both parties.

[0645] Based on the analysis results, the server generates and sends feedback to the terminal. This feedback provides sales representatives with specific actions to facilitate communication with customers. For example, if a customer appears anxious, the terminal screen will display instructions such as, "Please provide additional information here." Conversely, if the user is judged to be stressed, advice such as, "Take a deep breath and relax," will also be displayed.

[0646] Once the presentation is complete, the server reviews all relevant data again and generates specific improvement suggestions for the next presentation. Users can use this information to improve their presentation skills.

[0647] This invention provides a system that maximizes the effectiveness of sales activities and improves customer and user satisfaction by analyzing the emotions of both customers and users.

[0648] The following describes the processing flow.

[0649] Step 1:

[0650] At the start of the presentation, the device activates the camera and microphone installed in the conference room to collect customer facial and audio data. Simultaneously, it uses an emotion engine to capture the user's facial expressions and voice tone.

[0651] Step 2:

[0652] The server analyzes customer and user data received in real time from terminals using emotion recognition models and natural language processing models, respectively. It evaluates the customer's emotions and psychological state, as well as the user's psychological state and level of tension.

[0653] Step 3:

[0654] Based on the analysis results, the server generates feedback optimized for each customer and user. If the customer shows interest, it will generate instructions such as "Please explain in more detail on the next slide," and if the user shows signs of tension, it will generate "advice to help you relax."

[0655] Step 4:

[0656] The terminal receives feedback from the server and displays it in real time on the user's screen (the sales representative). Based on this feedback, the user can dynamically adjust their approach to customers.

[0657] Step 5:

[0658] After the presentation ends, the server uses all the saved data to generate specific improvement suggestions for the next presentation. For example, it might suggest, "Improving the introduction will make it easier to capture the audience's attention."

[0659] Step 6:

[0660] Users review reports provided by the server and use them to prepare for their next presentation. This process is expected to improve users' presentation skills and effectiveness.

[0661] (Example 2)

[0662] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0663] To ensure smooth communication during information provision and negotiation processes in business activities, it is crucial to accurately understand the emotional state of buyers and users and provide appropriate feedback. However, conventional technologies have limitations in their ability to analyze the emotions of users and buyers in real time and bidirectionally, and to generate effective feedback and optimization suggestions based on that analysis.

[0664] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0665] In this invention, the server includes information gathering means for analyzing the emotional state of users and buyers, analysis means for evaluating the emotions and psychological state of buyers in real time based on the collected information, and feedback providing means for presenting feedback to users based on the evaluation results and generated analysis data. This enables users to accurately grasp the emotional state of buyers and take appropriate action in real time.

[0666] A "user" is an entity that operates and uses the system to provide information to customers.

[0667] A "buyer" is someone who receives information and is likely to make purchasing decisions regarding companies and products.

[0668] "Emotional state" refers to the psychological state or emotions perceived through an individual's facial expressions and voice.

[0669] "Information gathering means" refers to devices and functions for acquiring data such as facial expressions and voices of users and purchasers.

[0670] "Analysis methods" refer to processes and technologies used to evaluate the emotional state of buyers and users based on collected data.

[0671] A "feedback provision mechanism" is a function that provides users with appropriate actions and improvement suggestions based on the analysis results.

[0672] The "generation method" refers to a function that uses the analysis results to generate improvement suggestions that will be useful for future information provision activities.

[0673] A "generated language processing model" is a data processing technique that has been pre-built for the purpose of analyzing natural language.

[0674] An "emotion recognition model" is an algorithm and technology used to identify an individual's emotions and psychological state from collected data.

[0675] This invention is a system for optimizing communication between users and customers in business activities, providing real-time feedback using emotion recognition technology and language processing technology.

[0676] The terminal uses cameras and microphones installed in the conference room to collect facial and voice data from customers. This data is converted into a digital format and transmitted to a server via a secure communication protocol. The terminal is equipped with an emotion engine that generates analyzable data, including the user's reactions and tone of voice.

[0677] The server applies an emotion recognition model and a generated language processing model to the received data to analyze the buyer's emotional and psychological state. Based on the analysis, it generates specific feedback on how the user should change their approach. This feedback is returned to the device in real time and displayed on the user's screen as appropriate actions and advice.

[0678] For example, if a buyer expresses concern about the information, the server generates and sends feedback such as, "Emphatically emphasize the product's safety here." If the user is nervous, advice such as, "Calm down and speak slowly again," is provided.

[0679] Furthermore, after the meeting concludes, the server generates specific suggestions for improving the quality of the next presentation based on all the analyzed data. This allows users to effectively improve their own performance.

[0680] Examples of prompts include, "What additional information should I provide to alleviate the customer's anxiety?" and "Please tell me some specific ways to calm myself down when I'm feeling nervous."

[0681] This system allows users to facilitate communication with buyers and provide optimal information.

[0682] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0683] Step 1:

[0684] The terminal collects facial and voice data from buyers via cameras and microphones installed in the conference room. This input data is converted into numerical data and stored in a database. Preprocessing, such as noise reduction and feature point extraction for faces and voices, is performed to improve data accuracy. The output is analyzable digital data.

[0685] Step 2:

[0686] The device uses an emotion engine to generate data for analyzing the user's facial expressions and voice tone. This step involves reaction capture and voice tone analysis to quantify the user's emotional state. The input is information obtained from the device's camera and microphone, and the output is numerical data indicating the user's psychological state.

[0687] Step 3:

[0688] The server receives purchaser and user data transmitted from the terminal and analyzes the data using an emotion recognition model and a generated language processing model. The input is numerical data transmitted from Step 1 and Step 2, and the data calculations involve identifying emotional states and evaluating psychological states. The output is the emotion analysis result.

[0689] Step 4:

[0690] The server generates feedback based on the emotional state of the buyer and user, using the analysis results. It utilizes a generative AI model to determine a communication strategy appropriate to the buyer's reaction. The input is the analysis results from step 3, and the output is the generated feedback instructions.

[0691] Step 5:

[0692] The terminal receives feedback sent from the server and displays it to the user in real time. Specifically, it displays feedback messages to help the user respond immediately. The input is the feedback from step 4, and the output is the action item displayed on the user's screen.

[0693] Step 6:

[0694] The server reanalyzes all relevant data after the meeting ends and generates specific improvement suggestions for the next presentation. The input is the entire presentation data, and the data calculations include comparative analysis with past data. The output is the points for improvement that can be used next time.

[0695] (Application Example 2)

[0696] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0697] In modern face-to-face sales, accurately understanding customer emotions and providing appropriate service is essential, but it is a difficult challenge for salespeople to instantly understand customer emotions and take appropriate action. Furthermore, there is a lack of mechanisms for salespeople to receive real-time feedback necessary to improve their customer service skills, making it time-consuming for individuals to improve their skills.

[0698] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0699] In this invention, the server includes data collection means for analyzing the customer's emotional state, analysis means for evaluating the customer's emotions in real time based on the collected data, feedback provision means for providing feedback to the salesperson based on the evaluation results, and presentation means for providing the customer's emotion analysis results as visual feedback to a glasses-type device worn by the salesperson. This enables accurate customer service in line with the customer's emotions and supports the improvement of the salesperson's customer service skills.

[0700] "Customer emotional state" refers to the psychological or emotional state that can be extracted from a customer's facial expressions, tone of voice, choice of words, and other factors.

[0701] "Data collection methods" refer to devices and technologies used to collect information necessary to understand the emotional state of customers.

[0702] "Analysis means" refers to computational or processing techniques used to evaluate customer emotions based on collected data.

[0703] "Feedback provision means" refers to devices or systems that provide information to sales staff based on analysis results to encourage appropriate responses and actions.

[0704] "Generation means" refers to devices or technologies that have the function of generating suggestions for improving future face-to-face sales activities based on data acquired by analysis means.

[0705] "Presentation means" refers to technology or equipment for visually displaying analysis results on a device worn by a salesperson.

[0706] A "glasses-type device" refers to an electronic device in the form of glasses that salespeople wear, allowing them to visually confirm analysis results and feedback.

[0707] The system for implementing this invention aims to analyze customer emotions and provide real-time feedback to sales staff.

[0708] The server first receives facial expression and voice data from customers collected using cameras and microphones. This data is acquired in the initial stages by devices such as smart glasses and smartphones. The server uses this data to analyze the customer's emotional state using emotion recognition technology powered by TensorFlow and Google Cloud's natural language processing APIs.

[0709] The analysis results are generated as real-time feedback and sent immediately to a glasses-type device worn by the salesperson, for example, using the Twilio API. The feedback provided includes information that allows the salesperson to instantly adjust their actions. For example, if the customer appears confused, instructions such as "Encourage questions in a calm tone" might appear on the glasses.

[0710] As a concrete example, let's consider a clothing store setting. When a customer makes a troubled expression while trying on clothes, the system analyzes that expression and provides feedback to the sales staff, such as, "Please ask about the fit of the size." This can improve the quality of customer service.

[0711] An example of a prompt from a generative AI model is, "When a customer appears tired, suggest ways to help them relax and enjoy their shopping experience."

[0712] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0713] Step 1:

[0714] The device collects customer facial and voice data in real time via its camera and microphone. The input is the customer's facial expressions and voice, while the output is a state where these are ready to be sent as digital data to a server. The device acquires data with high precision using sensors.

[0715] Step 2:

[0716] The server receives facial expression and voice data transmitted from the terminal and analyzes it using an emotion recognition model. The input is digital data from the terminal, and the output is the analysis result indicating the customer's emotional state. In this process, TensorFlow is used to extract data features and determine the customer's psychological state.

[0717] Step 3:

[0718] The server generates feedback based on the analysis results and sends it to the salesperson's glasses-type device. The input is the analysis results indicating the customer's emotional state, and the output is specific action suggestions for the salesperson. The feedback is presented visually using the Twilio API, providing the salesperson with appropriate instructions.

[0719] Step 4:

[0720] The user adjusts their communication with the customer based on feedback received through a glasses-type device. The input is feedback information displayed on the glasses, and the output is the salesperson's actions during the actual interaction. The user responds flexibly according to the instructions to improve customer satisfaction.

[0721] Step 5:

[0722] The server collects performance data related to user interactions and uses it to improve feedback for future interactions. The input is data on the results of the interaction, and the output is training data to improve the accuracy of future feedback. This learning process continuously improves the accuracy of the feedback.

[0723] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0724] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0725] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0726] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0727] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0728] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0729] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0730] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0731] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0732] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0733] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0734] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0735] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0736] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0737] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0738] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0739] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0740] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0741] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0742] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0743] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0744] The following is further disclosed regarding the embodiments described above.

[0745] (Claim 1)

[0746] A means of collecting data to analyze the emotional state of customers,

[0747] An analytical method that evaluates customer emotions in real time based on collected data,

[0748] A feedback provision method that provides feedback to sales representatives based on evaluation results,

[0749] A generation means that generates an optimization proposal for the next presentation using the data obtained by the analysis means,

[0750] A system that includes this.

[0751] (Claim 2)

[0752] The system according to claim 1, further comprising a predictive means for analyzing past presentation data and proposing in advance how to approach specific customers.

[0753] (Claim 3)

[0754] The system according to claim 1, comprising a natural language processing model and an emotion recognition model for analyzing the facial expressions and tone of voice of a customer.

[0755] "Example 1"

[0756] (Claim 1)

[0757] Means of obtaining information,

[0758] An analytical method that judges emotions in real time based on the acquired information,

[0759] A means of providing interaction to the person in charge based on the judgment result,

[0760] A means for generating an optimization proposal for the next activity using the information obtained by the aforementioned analysis means,

[0761] Means of visually displaying information on-site,

[0762] A system that includes this.

[0763] (Claim 2)

[0764] The system according to claim 1, further comprising a predictive means for analyzing past information and providing in advance a method of action for a specific target.

[0765] (Claim 3)

[0766] The system according to claim 1, comprising a natural language processing model and an emotion recognition model for analyzing the facial expressions and tone of voice of a subject.

[0767] "Application Example 1"

[0768] (Claim 1)

[0769] Information gathering methods for acquiring customer facial expressions and voice information,

[0770] An analytical method that determines the emotional state of customers in real time based on the collected information,

[0771] A means of providing information to sales staff based on the judgment result,

[0772] A generation means that uses the information determined by the analysis means to create proposals for improving future customer service,

[0773] A system that includes this.

[0774] (Claim 2)

[0775] The system according to claim 1, further comprising a predictive means for analyzing past customer service data and proposing in advance the optimal response method for a specific customer.

[0776] (Claim 3)

[0777] The system according to claim 1, comprising a natural language processing model and an emotion recognition model for analyzing the facial expressions and voice characteristics of a customer.

[0778] "Example 2 of combining an emotion engine"

[0779] (Claim 1)

[0780] Information gathering means for analyzing the emotional state of users and buyers,

[0781] An analytical means for evaluating the emotions and psychological state of buyers in real time based on the collected information,

[0782] A feedback provision means that presents feedback to the user based on the evaluation results and generated analysis data,

[0783] A generation means that generates an optimization proposal for the next information provision activity using the data obtained by the analysis means,

[0784] A system that includes this.

[0785] (Claim 2)

[0786] The system according to claim 1, further comprising a predictive means for analyzing information on past information provision activities and proposing in advance methods of approaching specific buyers.

[0787] (Claim 3)

[0788] The system according to claim 1, comprising a generated language processing model and an emotion recognition model for analyzing the facial expressions and tone of voice of a buyer.

[0789] "Application example 2 when combining with an emotional engine"

[0790] (Claim 1)

[0791] A means of collecting data to analyze the emotional state of customers,

[0792] An analytical method that evaluates customer emotions in real time based on collected data,

[0793] A feedback provision method that provides feedback to sales staff based on evaluation results,

[0794] A generation means that generates an optimized proposal for the next face-to-face sales using the data obtained by the analysis means,

[0795] A presentation method that provides the results of customer emotion analysis as visual feedback to a glasses-type device worn by a salesperson,

[0796] A system that includes this.

[0797] (Claim 2)

[0798] The system according to claim 1, further comprising a predictive means for analyzing past face-to-face sales data and proposing in advance how to approach specific customers.

[0799] (Claim 3)

[0800] The system according to claim 1, comprising language processing technology and emotion recognition technology for analyzing the facial expressions and tone of voice of a customer. [Explanation of Symbols]

[0801] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. Information gathering methods for acquiring customer facial expressions and voice information, An analytical method that determines the emotional state of customers in real time based on the collected information, A means of providing information to sales staff based on the judgment result, A generation means that uses the information determined by the analysis means to create proposals for improving future customer service, A system that includes this.

2. The system according to claim 1, further comprising a predictive means for analyzing past customer service data and proposing in advance the optimal response method for a specific customer.

3. The system according to claim 1, comprising a natural language processing model and an emotion recognition model for analyzing the facial expressions and voice characteristics of a customer.