system

JP2026100602APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-09
Publication Date
2026-06-19

Smart Images

  • Figure 2026100602000001_ABST
    Figure 2026100602000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A means of receiving voice conversations with customers in real time, A means of analyzing received audio data to determine emotions, A means of converting audio data into text data, A means for detecting specific keywords based on converted character data, A means for generating and sending notifications in response to detected keywords, A means of extracting and saving positive feedback, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0005] , ,

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] During the interaction with customers in the customer service industry, it is difficult to respond promptly when problems occur, which requires a lot of time and human resources. In addition, it is also a problem to efficiently collect and utilize successful cases for improving customer satisfaction and handling claims. Furthermore, in order to prevent compliance violations, it is necessary to appropriately monitor the content of the interaction, but there are limitations in the current manual monitoring. These problems increase the burden in the customer service business and may hinder the improvement of customer satisfaction and performance.

Means for Solving the Problems

[0005] This invention provides a means for receiving voice conversations with customers in real time and analyzing them using voice emotion recognition technology. Based on the analysis results, the voice data is converted into text data and specific keywords are detected. Depending on the detected keywords and changes in emotion, notifications are quickly generated under pre-set conditions and sent to the relevant parties. Furthermore, by automatically extracting and saving positive feedback, success stories are efficiently collected. Through these means, the quality of customer service is improved, and operational efficiency and compliance are strengthened.

[0006] "Voice dialogue" refers to verbal communication between customers and service providers.

[0007] "Real-time" refers to events and data collection occurring almost simultaneously with actual time.

[0008] "Audio data" refers to sound information collected by microphones or other devices, stored in digital format.

[0009] "Voice emotion recognition" refers to a technology that analyzes the characteristics of speech to determine the emotional state of the speaker.

[0010] "Text data" refers to a digital representation of sentences or words in the form of characters.

[0011] A "keyword" refers to an important word or phrase that has a specific meaning or topic.

[0012] A "notification" refers to information that informs a user or administrator that some action is required.

[0013] "Positive feedback" refers to the positive evaluations and expressions of gratitude that customers show towards a service.

[0014] "Compliance" means adhering to laws, regulations, and internal company rules in the course of business operations. [Brief explanation of the drawing]

[0015] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

MODE FOR CARRYING OUT THE INVENTION

[0016] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0017] First, the terms used in the following description will be explained.

[0018] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0019] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0020] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0023] [First Embodiment]

[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0036] As an embodiment of the present invention, a customer service support system using voice interaction with customers will be described. In this system, when voice interaction is initiated by a terminal installed in the store, voice data is immediately sent to a server. The server analyzes the received voice data in parallel using a voice emotion recognition module and a transcription module.

[0037] The server analyzes voice data to identify the customer's emotional state and sends a real-time notification to the administrator if negative emotions such as dissatisfaction or anger are detected, or if keywords related to a specific problem are found. This notification allows the user to take immediate action.

[0038] Meanwhile, the server simultaneously extracts expressions of gratitude and joy from the conversation and stores them in a database to store portions of the voice dialogue that include positive feedback and success stories. This allows users to use these examples to improve their services and responses.

[0039] Furthermore, this system also includes a compliance monitoring function. When inappropriate comments or expressions are detected in user-customer interactions, it automatically records the audio data before and after the incident and notifies the relevant departments as needed. This helps users ensure legal compliance and prevent damage to their company's reputation.

[0040] Thus, the present invention, as a voice analysis system that supports customer service operations, achieves both improved customer experience and increased efficiency in business processes. By using this system, users involved in customer service can quickly identify emotional fluctuations and compliance violations, and take appropriate action in a timely manner.

[0041] The following describes the processing flow.

[0042] Step 1:

[0043] The device starts recording audio immediately when a conversation with a customer begins. The recorded audio data is streamed to the server in real time.

[0044] Step 2:

[0045] The server passes the received audio data to the speech emotion recognition module. The module analyzes the audio data and identifies the customer's emotional state based on their voice tone, speed, and intonation.

[0046] Step 3:

[0047] The server simultaneously passes the audio data to a transcription module, which converts the dialogue into text data. This text data is then used for keyword detection and logging.

[0048] Step 4:

[0049] The server searches for specific keywords in the converted text, and if found, records and saves a portion of the relevant call. Depending on the detection criteria, it sends a notification to the relevant administrator if necessary.

[0050] Step 5:

[0051] Users receive notifications from the server and take prompt action based on those notifications if necessary. The system monitors the situation in real time and adapts services accordingly.

[0052] Step 6:

[0053] The server automatically extracts portions of voice interactions that contain positive feedback and saves them in the database as success stories. This information can be referenced later to help improve the service.

[0054] (Example 1)

[0055] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0056] In voice interactions with customers, it is essential to accurately grasp their emotions and needs in real time and respond quickly. Furthermore, recording and utilizing positive customer feedback is expected to improve services. On the other hand, monitoring inappropriate content and compliance violations to protect the company's reputation is also a crucial issue.

[0057] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0058] In this invention, the server includes means for acquiring voice conversations with customers in real time, means for analyzing the received voice information to recognize emotions, means for converting the acquired voice information into symbolic information, means for recognizing specific words or phrases based on the converted symbolic information, means for detecting inappropriate content and recording the voice information before and after it, and means for immediately providing notifications to the administrator based on the analysis results. This makes it possible to grasp customer emotions in real time and respond quickly, record positive feedback, monitor inappropriate remarks and protect corporate reputation.

[0059] "Voice interaction with customers" refers to communication through the voices of customers.

[0060] "Means of acquiring data in real time" refers to technical means for instantly collecting and processing audio data.

[0061] "Audio information" refers to data recorded as changes in sound.

[0062] A "means of recognizing emotions" refers to an algorithm or process for identifying emotional states from speech.

[0063] "Symbolic information" refers to data obtained by converting speech into symbols or characters.

[0064] "Specific terms" refer to important keywords or phrases that have been set in advance.

[0065] "Inappropriate content" refers to expressions or statements that may violate compliance.

[0066] "Analysis results" refer to information obtained through the processing of audio data.

[0067] "Means of providing notifications to administrators" refers to a function that uses analysis results to inform administrators of important information.

[0068] A "means of recording" refers to a function that saves specific data so that it can be referenced later.

[0069] The customer service support system proposed in this invention analyzes voice interactions with customers in real time and obtains various feedback. A terminal installed in the store collects customer voices and sends the voice data to a server, at which point the system begins processing. The server analyzes the received voice data using speech recognition technologies such as "NVIDIA Jarvis" or "Google® Cloud Speech-to-Text" to identify the customer's emotions. In addition, the server converts the voice data into text and filters it for specific keywords.

[0070] If negative customer sentiment or specific issues are detected, the server promptly sends a notification to the administrator. This notification allows users to address the problem quickly. The server also records positive customer feedback and success stories, storing them in a database for service improvement. This provides users with valuable data to improve the quality of their customer service.

[0071] Furthermore, the server has a compliance monitoring function that records the audio data before and after any inappropriate expressions or content are detected and reports it to the management department. This allows users to help maintain legal compliance.

[0072] For example, if a customer complains that their coffee is cold, the server immediately analyzes the audio and sends a notification to the administrator. Conversely, if a customer gives positive feedback such as "the service was excellent," it is recorded in the database and used to improve customer service.

[0073] An example of a prompt is: "Describe the process of analyzing customer voices and using emotions and keywords to activate a real-time notification system."

[0074] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0075] Step 1:

[0076] The terminal collects customer voice input via a microphone. When customer speech is detected, it begins recording voice data and transmits it to the server in real time. The input here is the customer's raw voice, and the output is digital voice data transmitted to the server.

[0077] Step 2:

[0078] The server inputs the received audio data into a speech emotion recognition module. This module analyzes the audio waveform and evaluates the emotion using a generative AI model. Specifically, it extracts features such as tone, pitch, and speed from the speech and infers the emotional state based on these. The input here is digital audio data, and the output is identified emotion information.

[0079] Step 3:

[0080] The server simultaneously inputs the audio data into a transcription module, where it converts the audio into text data. This process uses speech recognition technology to analyze and transcribe the audio content. The input is audio data, and the output is the converted text data.

[0081] Step 4:

[0082] The server extracts specific keywords from the converted text data. This process scans the text and identifies relevant phrases based on a pre-configured keyword list. The input is text data, and the output is a list of detected keywords.

[0083] Step 5:

[0084] The server generates and sends necessary notifications to the administrator based on sentiment and keyword information. For example, if keywords related to negative sentiment or complaints are detected, this triggers an alert to be sent to the administrator. The inputs here are sentiment and keyword information, and the output is the generated notification message.

[0085] Step 6:

[0086] The server identifies text that expresses positive customer feedback and delight, and stores it in a database. This includes the automatic extraction of expressions of gratitude and praise. The input is text data, and the output is data representing recorded positive feedback.

[0087] Step 7:

[0088] The server automatically records the surrounding data if it detects inappropriate content and notifies the relevant department. This process is intended to maintain corporate compliance and operates according to pre-defined criteria. The input is audio data, and the output is the recorded audio data and notification message.

[0089] (Application Example 1)

[0090] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0091] There is a need to improve the efficiency of customer service in physical stores, accurately understand customer emotions, and enable prompt responses. Furthermore, service improvement and compliance are also important, but there is no technology available to comprehensively support these aspects.

[0092] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0093] In this invention, the server includes a device for receiving voice interactions with customers in real time, a device for analyzing the received voice information to determine emotions, and a device for converting the voice information into text information. This makes it possible to quickly grasp the emotional state of customers and respond appropriately in real time. Furthermore, by generating and sending notifications according to the detected emotions and codes, it becomes possible to quickly share information with administrators. In addition, positive feedback can be saved and used to improve services, and inappropriate expressions can be automatically detected and recorded to maintain compliance.

[0094] "Voice interaction with customers" refers to the exchange of information between customers and store employees via voice within a store.

[0095] A "real-time receiving device" is an electronic device that instantly acquires and processes audio data.

[0096] "Audio information" refers to data in the form of electrically recorded or transmitted audio.

[0097] A "device that analyzes and determines emotions" is a processing device that identifies a customer's emotional state from voice information.

[0098] A "device that converts to text information" is a device that uses speech recognition technology to convert audio data into text data.

[0099] "Specific symbols" refer to important keywords or expressions that appear in the customer's statements.

[0100] A "device that generates and sends notifications" is a system that creates alerts and information based on detected data and sends them to relevant parties.

[0101] "Positive feedback" refers to expressions that indicate favorable feedback or praise from customers.

[0102] "Inappropriate language" refers to statements or expressions that do not comply with the standards set by the company and are deemed inappropriate.

[0103] A "device for recording related conversation content" is a device that saves audio data before and after a specific event when certain conditions are met.

[0104] This invention is a system that efficiently analyzes voice interactions with customers in physical stores and grasps their emotional state in real time. The system is configured as follows:

[0105] The server primarily receives and analyzes voice conversations in real time. The server is equipped with speech recognition software (e.g., Speech Recognition API) and emotion recognition modules, enabling it to instantly convert voice information into text and determine customer emotions. The analyzed voice information is stored in a database, and notifications are sent to the administrator as needed.

[0106] Furthermore, this system supports compliance by recognizing speech containing specific codes or inappropriate expressions and recording the relevant conversation content. Positive feedback is extracted separately and used to improve the service.

[0107] Furthermore, the application installed on each store terminal communicates with the server to notify store staff terminals of necessary information. This application utilizes a generative AI model through prompt messages. For example, if a customer expresses dissatisfaction with a new product in a store, the app immediately notifies the staff, enabling them to respond quickly.

[0108] An example of an input prompt for the generating AI model might be, "A customer seems dissatisfied with a new product. As a store employee, what kind of response can increase customer satisfaction?" This prompt helps store employees explore appropriate response methods.

[0109] In this way, the server, terminal, and user work together to create a system that enables efficient and flexible customer service.

[0110] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0111] Step 1:

[0112] The server receives audio data in real time. The audio information spoken by the customer is taken as input via the microphone, and the raw data is immediately sent to the server to prepare for subsequent processing.

[0113] Step 2:

[0114] The server converts the received audio information into text using a speech recognition module. In this step, data processing is performed, converting the audio data into text data using speech recognition software, which is then used as the output for the next step.

[0115] Step 3:

[0116] The server uses a voice analysis module to determine the emotional state from the text data. Here, data calculations are performed using emotion recognition technology, and the emotional state (e.g., anger, joy, sadness) is output as the analysis result. This information serves as an important indicator for customer service.

[0117] Step 4:

[0118] The server detects specific codes in the text data and performs processing based on those codes. For example, it extracts keywords related to complaints and generates data to produce real-time notifications when they are detected. As a result of this data processing on the input, notification instructions are output.

[0119] Step 5:

[0120] The server sends the generated notification to the store's terminal. This notification is displayed on the store staff's screen and serves as an output indicating what action is required in a given situation. Based on this notification, the user can then take appropriate action regarding the customer.

[0121] Step 6:

[0122] The device receives a notification, and the user takes appropriate action based on that information. This enables a rapid response that is in line with the customer's emotional state. Subsequently, the outcome of the interaction with the customer may be recorded as a prompt message for a generating AI model. This message is used as entry data to further improve the service based on the accumulated data.

[0123] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0124] Embodiments of the present invention will describe the combination of an emotion engine in a customer service support system using voice interaction with customers. This system first receives voice input from a terminal and transmits conversation data with the customer to a server in real time. The server passes the voice data to the emotion engine, which analyzes the emotional states of the customer and the user, respectively.

[0125] The server determines the customer's emotions based on their voice tone, speed, and intonation, and also recognizes the user's emotions from voice and operation data. Based on this analysis, if the user's emotions are positive, the conversation is automatically registered in the database as a success story to be used as a reference for service improvement. If the user's emotions are negative, a notification is sent to the administrator using a specified method, enabling a system to quickly provide support and corrective measures.

[0126] Furthermore, based on the customer sentiment analysis results, the server detects pre-configured keywords in the text data and performs specific actions. This includes, for example, sending alerts to relevant administrators to respond immediately to urgent requests or complaints from customers.

[0127] By using this system, users can accurately grasp customer needs in real time and respond based on their own emotional state. As a result, customer satisfaction improves and the overall efficiency of customer service operations is increased. For example, if a customer actively provides positive feedback in a store, that conversation is automatically stored and used as a success story for employee training. In this way, the present invention provides advanced customer service support functions that also take into account the user's emotions.

[0128] The following describes the processing flow.

[0129] Step 1:

[0130] The terminal records audio as soon as a conversation with a customer begins and sends the data to the server in real time.

[0131] Step 2:

[0132] The server passes the received audio data to the emotion engine. The emotion engine analyzes the customer's emotions from the audio and identifies them based on elements such as tone, speed, and intonation.

[0133] Step 3:

[0134] The server converts audio data into text data using a transcription module. It then uses the converted text to search for specific keywords and provides the results to other modules.

[0135] Step 4:

[0136] The user's emotions are also analyzed by the emotion engine, and their emotional state is identified based on their voice tone and the content of their conversation.

[0137] Step 5:

[0138] Based on the sentiment analysis results, the server sends a notification to relevant parties via the notification management system if the customer's or user's sentiment exceeds a certain threshold.

[0139] Step 6:

[0140] Users receive notifications from the server and take prompt action as needed. They consider and implement the most appropriate response based on the situation.

[0141] Step 7:

[0142] The server saves conversations containing positive feedback as success stories in its database. This information will be used later to improve service quality.

[0143] (Example 2)

[0144] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0145] Conventional customer service support systems struggled to accurately analyze customer emotions from voice interactions and provide appropriate support in real time. Furthermore, they lacked the means to effectively utilize positive feedback and respond quickly to negative situations, leaving challenges in improving customer satisfaction and operational efficiency.

[0146] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0147] In this invention, the server includes a device for receiving voice dialogues, a device for analyzing the received voice information to determine emotions, a device for converting voice information into text information, a device for automatically registering successful conversation examples in a case study collection based on the emotion analysis results, and a device for sending notifications to the administrator when negative emotions are detected. This makes it possible to accurately grasp emotions from conversations with customers, automatically take appropriate actions, and enable quick and effective customer service.

[0148] "Voice interaction" refers to communication using voice between a customer and a system.

[0149] "Voice information" refers to data obtained from voice dialogue, including the basic data that the server uses for analysis.

[0150] A "device for determining emotions" refers to a device that analyzes audio information to determine the emotional state of the speaker.

[0151] "Textual information" refers to information in text format generated by digitally processing audio information.

[0152] "Words" refer to words or phrases that have important meaning within written information and serve as triggers for sentiment analysis and actions.

[0153] A "notification generation and transmission device" is a device that creates and transmits notifications to relevant parties based on detected words or phrases.

[0154] "Positive feedback" refers to positive comments from customers and is used to improve services.

[0155] A "device for registering success stories in a case study collection" is a device for organizing and saving positive conversation content and accumulating it as reference material for the future.

[0156] "Negative emotions" refers to a state in which the speaker is showing dissatisfaction or negative emotions, as determined by emotion analysis.

[0157] A "device that sends notifications to administrators" is a device that automatically transmits information to administrators when negative emotions are detected, so that appropriate action can be taken.

[0158] This invention analyzes information obtained from voice interactions with customers in real time to provide advanced support for customer service. The system as a whole consists of a terminal, a server, and a user.

[0159] The terminal includes a device for capturing voice interactions with customers and acquiring high-quality audio information. This device uses a microphone with noise-canceling technology to maintain clarity of conversation. The acquired audio information is then transmitted to the server via encrypted communication protocols such as TLS / SSL.

[0160] The server plays a central role in processing the received audio information. The audio information is first passed to the sentiment analysis engine, where it is analyzed using a generative AI model on the server. This analysis evaluates the customer's and user's emotions. Once the emotions are determined, the server extracts positive feedback based on that and automatically registers it in the database as a success story.

[0161] Furthermore, the server detects specific words and phrases from the converted text information. When specific words or phrases are detected, the relevant actions are taken. For example, if the word "urgent" is identified, an alert can be sent to the responsible administrator to prompt immediate action. The server can also send a notification to the administrator if negative emotions are detected, enabling prompt support.

[0162] This system allows users to accurately understand customer needs in real time and provide appropriate support. As a result, it is expected to improve the overall efficiency of customer service operations and enhance customer satisfaction.

[0163] For example, if a store terminal records a positive comment from a customer such as "I am very satisfied," that data is organized on the server as a success story and used as material for employee training. On the other hand, if a complaint or grievance occurs, the server can send a notification to the administrator so that corrective action can be taken promptly.

[0164] An example of a prompt message would be, "What emotions can be inferred from this customer's statement?" This allows the system to perform a rapid and sophisticated analysis and provide an appropriate response.

[0165] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0166] Step 1:

[0167] The terminal acquires voice interaction with the customer. As input, the customer's spoken voice data is captured by the terminal's microphone. Specifically, the terminal uses noise cancellation technology to ensure clear voice data. As output, the voice data is encrypted and sent to the server for further processing.

[0168] Step 2:

[0169] The server receives audio data transmitted from the terminal and analyzes it using a generative AI model. The input is encrypted audio data from the terminal. Specifically, the AI ​​model analyzes the voice tone, speed, and intonation to determine the emotional state. The output is the analyzed emotional data.

[0170] Step 3:

[0171] The server automatically registers positive evaluations in the database based on sentiment data. The input consists of data that has been determined to be positive through sentiment analysis. Specifically, the server uses data management software to save the conversation as a successful example. The output is the updated database entry.

[0172] Step 4:

[0173] Based on the sentiment analysis results, the server sends a notification to the administrator if it detects negative emotions. The input is the emotion data that has been determined to be negative. Specifically, the server uses an email or messaging system to send an alert to the administrator in real time. The output is the notification sent to the administrator.

[0174] Step 5:

[0175] The server performs text conversion of audio information and detects specific keywords. The input is text data generated from audio data. In the specific data processing, a text analysis algorithm is used to identify the configured keywords. The output consists of the detected keywords and associated instructions.

[0176] Step 6:

[0177] The server performs a specific action based on the detected keyword. The input is the keyword detected in step 5. Specifically, if the keyword is "urgent," the server immediately sends an alert to the relevant department. The output is a record that the action was taken and any necessary follow-up.

[0178] (Application Example 2)

[0179] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0180] In customer service, it is essential to accurately understand customer emotions in real time and respond quickly and appropriately. Furthermore, there is a challenge in efficiently collecting positive customer feedback and utilizing it as training material within the organization.

[0181] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0182] In this invention, the server includes means for receiving voice interactions with customers in real time, means for analyzing the received voice data to determine emotions, and means for utilizing the extracted feedback as educational material. This enables appropriate responses tailored to customer emotions and the creation of effective training materials.

[0183] "Voice interaction with customers" refers to voice communication that takes place between a customer and a system or operator.

[0184] "Real-time reception" refers to a function that can instantly acquire and process audio data at the moment a conversation takes place.

[0185] "Methods for analyzing voice data to determine emotions" refer to technologies that analyze elements such as tone, speed, and intonation of voice to identify the speaker's emotional state.

[0186] "Methods for converting audio data into text data" refers to the process of converting audio information into text format using speech recognition technology.

[0187] "Methods for detecting specific keywords" refer to algorithms for identifying important words or phrases from converted character data.

[0188] A "means for generating and sending notifications" is a system that creates alerts or messages when specific events or conditions are met and sends them to the appropriate recipients.

[0189] "Means for extracting and saving positive feedback" refers to the process of identifying positive evaluations and opinions from customers and saving them in a database or file.

[0190] "Methods for utilizing as educational materials" refers to methods for using saved feedback as content for employee training and education.

[0191] To realize this invention, a system is needed that receives voice conversations with customers in real time and analyzes their content using an emotion analysis engine. When the server receives voice data, it processes the data using a dedicated API or software (e.g., a speech recognition engine or natural language processing tool) for emotion analysis. After the emotion analysis is complete, the voice data is converted into text data, and specific keywords that have been set in advance are detected. This allows the system to understand the customer's emotional state, and if positive feedback is obtained, it is stored in a database and used as training material within the organization.

[0192] The server also has the functionality to send notifications to administrators based on specific keywords or emotional states. This requires a network environment that enables real-time data communication and alert generation. Specific hardware includes smartphones and microphones as voice input terminals, and high-performance computers for processing the collected data. The software includes the aforementioned voice recognition engine, as well as applications that control the database management system and notification functions.

[0193] As a concrete example, consider a scenario involving customer service in a retail store. While a store employee is talking to a customer, the system detects the customer saying, "This is the best product I've ever used." The server detects this positive feedback, saves it to a database, and simultaneously processes it as training material for employees. Furthermore, managers can immediately learn about this success. An example of such a prompt would be, "We want to automatically record customer feedback on what they felt positively during in-store interactions and use it for employee training. Please provide specific examples of analysis."

[0194] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0195] Step 1:

[0196] The terminal captures conversations with customers as audio data in real time. The audio data is collected using a microphone and sent directly to the server. The input is the customer's raw voice, and the output is digital audio data.

[0197] Step 2:

[0198] The server sends the received audio data to the speech recognition engine, where it is converted into text data. The input is digital audio data, and the output is the corresponding text data. The audio waveform data is analyzed and converted into words and phrases through speech recognition.

[0199] Step 3:

[0200] The server passes the converted text data to the sentiment analysis engine to determine the customer's emotional state. The input is text data, and the output is the sentiment analysis result, i.e., the customer's emotional state (e.g., positive, negative). Natural language processing techniques are used to analyze emotional words and phrases in the text.

[0201] Step 4:

[0202] The server stores positive feedback in a database based on the sentiment analysis results. The input is the sentiment analysis results, and the output is the stored success story data. It automatically creates entries in the database and prepares the feedback for use as training material.

[0203] Step 5:

[0204] The server generates and sends a notification to the administrator if the detected emotion is negative and specific keywords are present. The input is the emotion analysis results and keyword detection results, and the output is the alert notification sent to the administrator. This notification is sent via email or as an alert message within the application using a notification system.

[0205] Step 6:

[0206] Users can access saved positive feedback to use as educational material. The input is a record of successful cases in the database, and the output is the available educational material. This material can be searched and displayed through the user interface.

[0207] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0208] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0209] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0210] [Second Embodiment]

[0211] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0212] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0213] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0214] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0215] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0216] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0217] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0218] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0219] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0220] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0221] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0222] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0223] As an embodiment of the present invention, a customer service support system using voice interaction with customers will be described. In this system, when voice interaction is initiated by a terminal installed in the store, voice data is immediately sent to a server. The server analyzes the received voice data in parallel using a voice emotion recognition module and a transcription module.

[0224] The server analyzes voice data to identify the customer's emotional state and sends a real-time notification to the administrator if negative emotions such as dissatisfaction or anger are detected, or if keywords related to a specific problem are found. This notification allows the user to take immediate action.

[0225] Meanwhile, the server simultaneously extracts expressions of gratitude and joy from the conversation and stores them in a database to store portions of the voice dialogue that include positive feedback and success stories. This allows users to use these examples to improve their services and responses.

[0226] Furthermore, this system also includes a compliance monitoring function. When inappropriate comments or expressions are detected in user-customer interactions, it automatically records the audio data before and after the incident and notifies the relevant departments as needed. This helps users ensure legal compliance and prevent damage to their company's reputation.

[0227] Thus, the present invention, as a voice analysis system that supports customer service operations, achieves both improved customer experience and increased efficiency in business processes. By using this system, users involved in customer service can quickly identify emotional fluctuations and compliance violations, and take appropriate action in a timely manner.

[0228] The following describes the processing flow.

[0229] Step 1:

[0230] The device starts recording audio immediately when a conversation with a customer begins. The recorded audio data is streamed to the server in real time.

[0231] Step 2:

[0232] The server passes the received audio data to the speech emotion recognition module. The module analyzes the audio data and identifies the customer's emotional state based on their voice tone, speed, and intonation.

[0233] Step 3:

[0234] The server simultaneously passes the audio data to a transcription module, which converts the dialogue into text data. This text data is then used for keyword detection and logging.

[0235] Step 4:

[0236] The server searches for specific keywords in the converted text, and if found, records and saves a portion of the relevant call. Depending on the detection criteria, it sends a notification to the relevant administrator if necessary.

[0237] Step 5:

[0238] Users receive notifications from the server and take prompt action based on those notifications if necessary. The system monitors the situation in real time and adapts services accordingly.

[0239] Step 6:

[0240] The server automatically extracts portions of voice interactions that contain positive feedback and saves them in the database as success stories. This information can be referenced later to help improve the service.

[0241] (Example 1)

[0242] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0243] In voice interactions with customers, it is essential to accurately grasp their emotions and needs in real time and respond quickly. Furthermore, recording and utilizing positive customer feedback is expected to improve services. On the other hand, monitoring inappropriate content and compliance violations to protect the company's reputation is also a crucial issue.

[0244] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0245] In this invention, the server includes means for acquiring voice conversations with customers in real time, means for analyzing the received voice information to recognize emotions, means for converting the acquired voice information into symbolic information, means for recognizing specific words or phrases based on the converted symbolic information, means for detecting inappropriate content and recording the voice information before and after it, and means for immediately providing notifications to the administrator based on the analysis results. This makes it possible to grasp customer emotions in real time and respond quickly, record positive feedback, monitor inappropriate remarks and protect corporate reputation.

[0246] "Voice interaction with customers" refers to communication through the voices of customers.

[0247] "Means of acquiring data in real time" refers to technical means for instantly collecting and processing audio data.

[0248] "Audio information" refers to data recorded as changes in sound.

[0249] A "means of recognizing emotions" refers to an algorithm or process for identifying emotional states from speech.

[0250] "Symbolic information" refers to data obtained by converting speech into symbols or characters.

[0251] "Specific terms" refer to important keywords or phrases that have been set in advance.

[0252] "Inappropriate content" refers to expressions or statements that may violate compliance.

[0253] "Analysis results" refer to information obtained through the processing of audio data.

[0254] "Means of providing notifications to administrators" refers to a function that uses analysis results to inform administrators of important information.

[0255] A "means of recording" refers to a function that saves specific data so that it can be referenced later.

[0256] The customer service support system proposed in this invention analyzes voice interactions with customers in real time and obtains various feedback. A terminal installed in the store collects customer voices and sends the voice data to a server, at which point the system begins processing. The server analyzes the received voice data using speech recognition technologies such as "NVIDIA Jarvis" or "Google Cloud Speech-to-Text" to identify the customer's emotions. In addition, the server converts the voice data into text and filters it for specific keywords.

[0257] If negative customer sentiment or specific issues are detected, the server promptly sends a notification to the administrator. This notification allows users to address the problem quickly. The server also records positive customer feedback and success stories, storing them in a database for service improvement. This provides users with valuable data to improve the quality of their customer service.

[0258] Furthermore, the server has a compliance monitoring function that records the audio data before and after any inappropriate expressions or content are detected and reports it to the management department. This allows users to help maintain legal compliance.

[0259] For example, if a customer complains that their coffee is cold, the server immediately analyzes the audio and sends a notification to the administrator. Conversely, if a customer gives positive feedback such as "the service was excellent," it is recorded in the database and used to improve customer service.

[0260] An example of a prompt is: "Describe the process of analyzing customer voices and using emotions and keywords to activate a real-time notification system."

[0261] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0262] Step 1:

[0263] The terminal collects customer voice input via a microphone. When customer speech is detected, it begins recording voice data and transmits it to the server in real time. The input here is the customer's raw voice, and the output is digital voice data transmitted to the server.

[0264] Step 2:

[0265] The server inputs the received audio data into a speech emotion recognition module. This module analyzes the audio waveform and evaluates the emotion using a generative AI model. Specifically, it extracts features such as tone, pitch, and speed from the speech and infers the emotional state based on these. The input here is digital audio data, and the output is identified emotion information.

[0266] Step 3:

[0267] The server simultaneously inputs the audio data into a transcription module, where it converts the audio into text data. This process uses speech recognition technology to analyze and transcribe the audio content. The input is audio data, and the output is the converted text data.

[0268] Step 4:

[0269] The server extracts specific keywords from the converted text data. This process scans the text and identifies relevant phrases based on a pre-configured keyword list. The input is text data, and the output is a list of detected keywords.

[0270] Step 5:

[0271] The server generates and sends necessary notifications to the administrator based on sentiment and keyword information. For example, if keywords related to negative sentiment or complaints are detected, this triggers an alert to be sent to the administrator. The inputs here are sentiment and keyword information, and the output is the generated notification message.

[0272] Step 6:

[0273] The server identifies text that expresses positive customer feedback and delight, and stores it in a database. This includes the automatic extraction of expressions of gratitude and praise. The input is text data, and the output is data representing recorded positive feedback.

[0274] Step 7:

[0275] The server automatically records the surrounding data if it detects inappropriate content and notifies the relevant department. This process is intended to maintain corporate compliance and operates according to pre-defined criteria. The input is audio data, and the output is the recorded audio data and notification message.

[0276] (Application Example 1)

[0277] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0278] There is a need to improve the efficiency of customer service in physical stores, accurately understand customer emotions, and enable prompt responses. Furthermore, service improvement and compliance are also important, but there is no technology available to comprehensively support these aspects.

[0279] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0280] In this invention, the server includes a device for receiving in real time a voice dialogue with a customer, a device for analyzing the received voice information to determine emotions, and a device for converting the voice information into character information. As a result, it becomes possible to quickly grasp the emotional state of the customer and take appropriate actions in real time. Also, by generating and transmitting notifications according to the detected emotions and symbols, it becomes possible to quickly share information with the administrator. Furthermore, positive opinions can be saved and utilized for service improvement, while inappropriate expressions can be automatically detected and recorded to maintain compliance.

[0281] "The voice dialogue with the customer" refers to the exchange of information via voice between the customer and the store clerk within the store.

[0282] "The device for receiving in real time" is an electronic device for immediately acquiring and processing voice data.

[0283] "Voice information" is data in a form that electrically records or transmits voice.

[0284] "The device for analyzing to determine emotions" is a processing device for identifying the emotional state of the customer from the voice information.

[0285] "The device for converting into character information" is a device that uses voice recognition technology to convert voice data into character data.

[0286] "Specific symbols" refer to important keywords and expressions that appear in the customer's speech.

[0287] "The device for generating and transmitting notifications" is a system that creates alerts and information based on the detected information and transmits them to relevant parties.

[0288] "Positive opinions" are expressions that indicate favorable feedback or praise from the customer.

[0289] "Inappropriate expressions" are statements and expressions that do not conform to the standards set by the company and are considered inappropriate.

[0290] A "device for recording related conversation content" is a device that saves audio data before and after a specific event when certain conditions are met.

[0291] This invention is a system that efficiently analyzes voice interactions with customers in physical stores and grasps their emotional state in real time. The system is configured as follows:

[0292] The server primarily receives and analyzes voice conversations in real time. The server is equipped with speech recognition software (e.g., Speech Recognition API) and emotion recognition modules, enabling it to instantly convert voice information into text and determine customer emotions. The analyzed voice information is stored in a database, and notifications are sent to the administrator as needed.

[0293] Furthermore, this system supports compliance by recognizing speech containing specific codes or inappropriate expressions and recording the relevant conversation content. Positive feedback is extracted separately and used to improve the service.

[0294] Furthermore, the application installed on each store terminal communicates with the server to notify store staff terminals of necessary information. This application utilizes a generative AI model through prompt messages. For example, if a customer expresses dissatisfaction with a new product in a store, the app immediately notifies the staff, enabling them to respond quickly.

[0295] An example of an input prompt for the generating AI model might be, "A customer seems dissatisfied with a new product. As a store employee, what kind of response can increase customer satisfaction?" This prompt helps store employees explore appropriate response methods.

[0296] In this way, the server, terminal, and user work together to create a system that enables efficient and flexible customer service.

[0297] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0298] Step 1:

[0299] The server receives audio data in real time. The audio information spoken by the customer is taken as input via the microphone, and the raw data is immediately sent to the server to prepare for subsequent processing.

[0300] Step 2:

[0301] The server converts the received audio information into text using a speech recognition module. In this step, data processing is performed, converting the audio data into text data using speech recognition software, which is then used as the output for the next step.

[0302] Step 3:

[0303] The server uses a voice analysis module to determine the emotional state from the text data. Here, data calculations are performed using emotion recognition technology, and the emotional state (e.g., anger, joy, sadness) is output as the analysis result. This information serves as an important indicator for customer service.

[0304] Step 4:

[0305] The server detects specific codes in the text data and performs processing based on those codes. For example, it extracts keywords related to complaints and generates data to produce real-time notifications when they are detected. As a result of this data processing on the input, notification instructions are output.

[0306] Step 5:

[0307] The server sends the generated notification to the store terminal. This notification is displayed on the screen of the store staff and functions as an output to indicate what kind of situation requires response. Based on this notification, the user can carry out actual customer service.

[0308] Step 6:

[0309] The terminal receives the notification, and the user takes corresponding actions according to the situation based on the information. This enables quick responses in line with the customer's emotional state. After that, the result of the customer service may be recorded as a prompt sentence for the generative AI model. This sentence is used as entry data for further service improvement based on the accumulated data.

[0310] Furthermore, an emotion engine for estimating the user's emotion may be combined. That is, the specific processing unit 290 may estimate the user's emotion using the emotion specific model 59 and perform specific processing using the user's emotion.

[0311] An embodiment of the present invention will be described for the combination of emotion engines in a customer service support system using voice conversations with customers. This system first receives voice input from the terminal and transmits the conversation data with the customer to the server in real time. The server passes the voice data to the emotion engine and analyzes the emotional states of the customer and the user respectively.

[0312] The server judges the customer's emotion based on the customer's voice tone, speed, and intonation, and also recognizes the user's emotion from the voice and operation data. Based on the analysis result, if the user's emotion is positive, the conversation is automatically registered in the database as a successful case for reference in service improvement. Also, when the user's emotion is negative, a notification is sent to the administrator by a specified method to establish a system that can quickly provide support and improvement measures.

[0313] Furthermore, based on the customer sentiment analysis results, the server detects pre-configured keywords in the text data and performs specific actions. This includes, for example, sending alerts to relevant administrators to respond immediately to urgent requests or complaints from customers.

[0314] By using this system, users can accurately grasp customer needs in real time and respond based on their own emotional state. As a result, customer satisfaction improves and the overall efficiency of customer service operations is increased. For example, if a customer actively provides positive feedback in a store, that conversation is automatically stored and used as a success story for employee training. In this way, the present invention provides advanced customer service support functions that also take into account the user's emotions.

[0315] The following describes the processing flow.

[0316] Step 1:

[0317] The terminal records audio as soon as a conversation with a customer begins and sends the data to the server in real time.

[0318] Step 2:

[0319] The server passes the received audio data to the emotion engine. The emotion engine analyzes the customer's emotions from the audio and identifies them based on elements such as tone, speed, and intonation.

[0320] Step 3:

[0321] The server converts audio data into text data using a transcription module. It then uses the converted text to search for specific keywords and provides the results to other modules.

[0322] Step 4:

[0323] The user's emotions are also analyzed by the emotion engine, and their emotional state is identified based on their voice tone and the content of their conversation.

[0324] Step 5:

[0325] Based on the sentiment analysis results, the server sends a notification to relevant parties via the notification management system if the customer's or user's sentiment exceeds a certain threshold.

[0326] Step 6:

[0327] Users receive notifications from the server and take prompt action as needed. They consider and implement the most appropriate response based on the situation.

[0328] Step 7:

[0329] The server saves conversations containing positive feedback as success stories in its database. This information will be used later to improve service quality.

[0330] (Example 2)

[0331] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0332] Conventional customer service support systems struggled to accurately analyze customer emotions from voice interactions and provide appropriate support in real time. Furthermore, they lacked the means to effectively utilize positive feedback and respond quickly to negative situations, leaving challenges in improving customer satisfaction and operational efficiency.

[0333] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0334] In this invention, the server includes a device for receiving voice dialogues, a device for analyzing the received voice information to determine emotions, a device for converting voice information into text information, a device for automatically registering successful conversation examples in a case study collection based on the emotion analysis results, and a device for sending notifications to the administrator when negative emotions are detected. This makes it possible to accurately grasp emotions from conversations with customers, automatically take appropriate actions, and enable quick and effective customer service.

[0335] "Voice interaction" refers to communication using voice between a customer and a system.

[0336] "Voice information" refers to data obtained from voice dialogue, including the basic data that the server uses for analysis.

[0337] A "device for determining emotions" refers to a device that analyzes audio information to determine the emotional state of the speaker.

[0338] "Textual information" refers to information in text format generated by digitally processing audio information.

[0339] "Words" refer to words or phrases that have important meaning within written information and serve as triggers for sentiment analysis and actions.

[0340] A "notification generation and transmission device" is a device that creates and transmits notifications to relevant parties based on detected words or phrases.

[0341] "Positive feedback" refers to positive comments from customers and is used to improve services.

[0342] A "device for registering success stories in a case study collection" is a device for organizing and saving positive conversation content and accumulating it as reference material for the future.

[0343] "Negative emotions" refers to a state in which the speaker is showing dissatisfaction or negative emotions, as determined by emotion analysis.

[0344] A "device that sends notifications to administrators" is a device that automatically transmits information to administrators when negative emotions are detected, so that appropriate action can be taken.

[0345] This invention analyzes information obtained from voice interactions with customers in real time to provide advanced support for customer service. The system as a whole consists of a terminal, a server, and a user.

[0346] The terminal includes a device for capturing voice interactions with customers and acquiring high-quality audio information. This device uses a microphone with noise-canceling technology to maintain clarity of conversation. The acquired audio information is then transmitted to the server via encrypted communication protocols such as TLS / SSL.

[0347] The server plays a central role in processing the received audio information. The audio information is first passed to the sentiment analysis engine, where it is analyzed using a generative AI model on the server. This analysis evaluates the customer's and user's emotions. Once the emotions are determined, the server extracts positive feedback based on that and automatically registers it in the database as a success story.

[0348] Furthermore, the server detects specific words and phrases from the converted text information. When specific words or phrases are detected, the relevant actions are taken. For example, if the word "urgent" is identified, an alert can be sent to the responsible administrator to prompt immediate action. The server can also send a notification to the administrator if negative emotions are detected, enabling prompt support.

[0349] This system allows users to accurately understand customer needs in real time and provide appropriate support. As a result, it is expected to improve the overall efficiency of customer service operations and enhance customer satisfaction.

[0350] For example, if a store terminal records a positive comment from a customer such as "I am very satisfied," that data is organized on the server as a success story and used as material for employee training. On the other hand, if a complaint or grievance occurs, the server can send a notification to the administrator so that corrective action can be taken promptly.

[0351] An example of a prompt message would be, "What emotions can be inferred from this customer's statement?" This allows the system to perform a rapid and sophisticated analysis and provide an appropriate response.

[0352] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0353] Step 1:

[0354] The terminal acquires voice interaction with the customer. As input, the customer's spoken voice data is captured by the terminal's microphone. Specifically, the terminal uses noise cancellation technology to ensure clear voice data. As output, the voice data is encrypted and sent to the server for further processing.

[0355] Step 2:

[0356] The server receives audio data transmitted from the terminal and analyzes it using a generative AI model. The input is encrypted audio data from the terminal. Specifically, the AI ​​model analyzes the voice tone, speed, and intonation to determine the emotional state. The output is the analyzed emotional data.

[0357] Step 3:

[0358] The server automatically registers positive evaluations in the database based on sentiment data. The input consists of data that has been determined to be positive through sentiment analysis. Specifically, the server uses data management software to save the conversation as a successful example. The output is the updated database entry.

[0359] Step 4:

[0360] Based on the sentiment analysis results, the server sends a notification to the administrator if it detects negative emotions. The input is the emotion data that has been determined to be negative. Specifically, the server uses an email or messaging system to send an alert to the administrator in real time. The output is the notification sent to the administrator.

[0361] Step 5:

[0362] The server performs text conversion of audio information and detects specific keywords. The input is text data generated from audio data. In the specific data processing, a text analysis algorithm is used to identify the configured keywords. The output consists of the detected keywords and associated instructions.

[0363] Step 6:

[0364] The server performs a specific action based on the detected keyword. The input is the keyword detected in step 5. Specifically, if the keyword is "urgent," the server immediately sends an alert to the relevant department. The output is a record that the action was taken and any necessary follow-up.

[0365] (Application Example 2)

[0366] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0367] In customer service, it is essential to accurately understand customer emotions in real time and respond quickly and appropriately. Furthermore, there is a challenge in efficiently collecting positive customer feedback and utilizing it as training material within the organization.

[0368] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0369] In this invention, the server includes means for receiving voice interactions with customers in real time, means for analyzing the received voice data to determine emotions, and means for utilizing the extracted feedback as educational material. This enables appropriate responses tailored to customer emotions and the creation of effective training materials.

[0370] "Voice interaction with customers" refers to voice communication that takes place between a customer and a system or operator.

[0371] "Real-time reception" refers to a function that can instantly acquire and process audio data at the moment a conversation takes place.

[0372] "Methods for analyzing voice data to determine emotions" refer to technologies that analyze elements such as tone, speed, and intonation of voice to identify the speaker's emotional state.

[0373] "Methods for converting audio data into text data" refers to the process of converting audio information into text format using speech recognition technology.

[0374] "Methods for detecting specific keywords" refer to algorithms for identifying important words or phrases from converted character data.

[0375] A "means for generating and sending notifications" is a system that creates alerts or messages when specific events or conditions are met and sends them to the appropriate recipients.

[0376] "Means for extracting and saving positive feedback" refers to the process of identifying positive evaluations and opinions from customers and saving them in a database or file.

[0377] "Methods for utilizing as educational materials" refers to methods for using saved feedback as content for employee training and education.

[0378] To realize this invention, a system is needed that receives voice conversations with customers in real time and analyzes their content using an emotion analysis engine. When the server receives voice data, it processes the data using a dedicated API or software (e.g., a speech recognition engine or natural language processing tool) for emotion analysis. After the emotion analysis is complete, the voice data is converted into text data, and specific keywords that have been set in advance are detected. This allows the system to understand the customer's emotional state, and if positive feedback is obtained, it is stored in a database and used as training material within the organization.

[0379] The server also has the functionality to send notifications to administrators based on specific keywords or emotional states. This requires a network environment that enables real-time data communication and alert generation. Specific hardware includes smartphones and microphones as voice input terminals, and high-performance computers for processing the collected data. The software includes the aforementioned voice recognition engine, as well as applications that control the database management system and notification functions.

[0380] As a concrete example, consider a scenario involving customer service in a retail store. While a store employee is talking to a customer, the system detects the customer saying, "This is the best product I've ever used." The server detects this positive feedback, saves it to a database, and simultaneously processes it as training material for employees. Furthermore, managers can immediately learn about this success. An example of such a prompt would be, "We want to automatically record customer feedback on what they felt positively during in-store interactions and use it for employee training. Please provide specific examples of analysis."

[0381] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0382] Step 1:

[0383] The terminal captures conversations with customers as audio data in real time. The audio data is collected using a microphone and sent directly to the server. The input is the customer's raw voice, and the output is digital audio data.

[0384] Step 2:

[0385] The server sends the received audio data to the speech recognition engine, where it is converted into text data. The input is digital audio data, and the output is the corresponding text data. The audio waveform data is analyzed and converted into words and phrases through speech recognition.

[0386] Step 3:

[0387] The server passes the converted text data to the sentiment analysis engine to determine the customer's emotional state. The input is text data, and the output is the sentiment analysis result, i.e., the customer's emotional state (e.g., positive, negative). Natural language processing techniques are used to analyze emotional words and phrases in the text.

[0388] Step 4:

[0389] The server stores positive feedback in a database based on the sentiment analysis results. The input is the sentiment analysis results, and the output is the stored success story data. It automatically creates entries in the database and prepares the feedback for use as training material.

[0390] Step 5:

[0391] The server generates and sends a notification to the administrator if the detected emotion is negative and specific keywords are present. The input is the emotion analysis results and keyword detection results, and the output is the alert notification sent to the administrator. This notification is sent via email or as an alert message within the application using a notification system.

[0392] Step 6:

[0393] Users can access saved positive feedback to use as educational material. The input is a record of successful cases in the database, and the output is the available educational material. This material can be searched and displayed through the user interface.

[0394] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0395] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0396] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0397] [Third Embodiment]

[0398] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0399] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0400] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0401] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0402] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0403] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0404] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0405] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0406] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0407] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0408] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0409] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0410] As an embodiment of the present invention, a customer service support system using voice interaction with customers will be described. In this system, when voice interaction is initiated by a terminal installed in the store, voice data is immediately sent to a server. The server analyzes the received voice data in parallel using a voice emotion recognition module and a transcription module.

[0411] The server analyzes voice data to identify the customer's emotional state and sends a real-time notification to the administrator if negative emotions such as dissatisfaction or anger are detected, or if keywords related to a specific problem are found. This notification allows the user to take immediate action.

[0412] Meanwhile, the server simultaneously extracts expressions of gratitude and joy from the conversation and stores them in a database to store portions of the voice dialogue that include positive feedback and success stories. This allows users to use these examples to improve their services and responses.

[0413] Furthermore, this system also includes a compliance monitoring function. When inappropriate comments or expressions are detected in user-customer interactions, it automatically records the audio data before and after the incident and notifies the relevant departments as needed. This helps users ensure legal compliance and prevent damage to their company's reputation.

[0414] Thus, the present invention, as a voice analysis system that supports customer service operations, achieves both improved customer experience and increased efficiency in business processes. By using this system, users involved in customer service can quickly identify emotional fluctuations and compliance violations, and take appropriate action in a timely manner.

[0415] The following describes the processing flow.

[0416] Step 1:

[0417] The device starts recording audio immediately when a conversation with a customer begins. The recorded audio data is streamed to the server in real time.

[0418] Step 2:

[0419] The server passes the received audio data to the speech emotion recognition module. The module analyzes the audio data and identifies the customer's emotional state based on their voice tone, speed, and intonation.

[0420] Step 3:

[0421] The server simultaneously passes the audio data to a transcription module, which converts the dialogue into text data. This text data is then used for keyword detection and logging.

[0422] Step 4:

[0423] The server searches for specific keywords in the converted text, and if found, records and saves a portion of the relevant call. Depending on the detection criteria, it sends a notification to the relevant administrator if necessary.

[0424] Step 5:

[0425] Users receive notifications from the server and take prompt action based on those notifications if necessary. The system monitors the situation in real time and adapts services accordingly.

[0426] Step 6:

[0427] The server automatically extracts portions of voice interactions that contain positive feedback and saves them in the database as success stories. This information can be referenced later to help improve the service.

[0428] (Example 1)

[0429] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0430] In voice interactions with customers, it is essential to accurately grasp their emotions and needs in real time and respond quickly. Furthermore, recording and utilizing positive customer feedback is expected to improve services. On the other hand, monitoring inappropriate content and compliance violations to protect the company's reputation is also a crucial issue.

[0431] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0432] In this invention, the server includes means for acquiring voice conversations with customers in real time, means for analyzing the received voice information to recognize emotions, means for converting the acquired voice information into symbolic information, means for recognizing specific words or phrases based on the converted symbolic information, means for detecting inappropriate content and recording the voice information before and after it, and means for immediately providing notifications to the administrator based on the analysis results. This makes it possible to grasp customer emotions in real time and respond quickly, record positive feedback, monitor inappropriate remarks and protect corporate reputation.

[0433] "Voice interaction with customers" refers to communication through the voices of customers.

[0434] "Means of acquiring data in real time" refers to technical means for instantly collecting and processing audio data.

[0435] "Audio information" refers to data recorded as changes in sound.

[0436] A "means of recognizing emotions" refers to an algorithm or process for identifying emotional states from speech.

[0437] "Symbolic information" refers to data obtained by converting speech into symbols or characters.

[0438] "Specific terms" refer to important keywords or phrases that have been set in advance.

[0439] "Inappropriate content" refers to expressions or statements that may violate compliance.

[0440] "Analysis results" refer to information obtained through the processing of audio data.

[0441] "Means of providing notifications to administrators" refers to a function that uses analysis results to inform administrators of important information.

[0442] A "means of recording" refers to a function that saves specific data so that it can be referenced later.

[0443] The customer service support system proposed in this invention analyzes voice interactions with customers in real time and obtains various feedback. A terminal installed in the store collects customer voices and sends the voice data to a server, at which point the system begins processing. The server analyzes the received voice data using speech recognition technologies such as "NVIDIA Jarvis" or "Google Cloud Speech-to-Text" to identify the customer's emotions. In addition, the server converts the voice data into text and filters it for specific keywords.

[0444] If negative customer sentiment or specific issues are detected, the server promptly sends a notification to the administrator. This notification allows users to address the problem quickly. The server also records positive customer feedback and success stories, storing them in a database for service improvement. This provides users with valuable data to improve the quality of their customer service.

[0445] Furthermore, the server has a compliance monitoring function that records the audio data before and after any inappropriate expressions or content are detected and reports it to the management department. This allows users to help maintain legal compliance.

[0446] For example, if a customer complains that their coffee is cold, the server immediately analyzes the audio and sends a notification to the administrator. Conversely, if a customer gives positive feedback such as "the service was excellent," it is recorded in the database and used to improve customer service.

[0447] An example of a prompt is: "Describe the process of analyzing customer voices and using emotions and keywords to activate a real-time notification system."

[0448] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0449] Step 1:

[0450] The terminal collects customer voice input via a microphone. When customer speech is detected, it begins recording voice data and transmits it to the server in real time. The input here is the customer's raw voice, and the output is digital voice data transmitted to the server.

[0451] Step 2:

[0452] The server inputs the received audio data into a speech emotion recognition module. This module analyzes the audio waveform and evaluates the emotion using a generative AI model. Specifically, it extracts features such as tone, pitch, and speed from the speech and infers the emotional state based on these. The input here is digital audio data, and the output is identified emotion information.

[0453] Step 3:

[0454] The server simultaneously inputs the audio data into a transcription module, where it converts the audio into text data. This process uses speech recognition technology to analyze and transcribe the audio content. The input is audio data, and the output is the converted text data.

[0455] Step 4:

[0456] The server extracts specific keywords from the converted text data. This process scans the text and identifies relevant phrases based on a pre-configured keyword list. The input is text data, and the output is a list of detected keywords.

[0457] Step 5:

[0458] The server generates and sends necessary notifications to the administrator based on sentiment and keyword information. For example, if keywords related to negative sentiment or complaints are detected, this triggers an alert to be sent to the administrator. The inputs here are sentiment and keyword information, and the output is the generated notification message.

[0459] Step 6:

[0460] The server identifies text that expresses positive customer feedback and delight, and stores it in a database. This includes the automatic extraction of expressions of gratitude and praise. The input is text data, and the output is data representing recorded positive feedback.

[0461] Step 7:

[0462] The server automatically records the surrounding data if it detects inappropriate content and notifies the relevant department. This process is intended to maintain corporate compliance and operates according to pre-defined criteria. The input is audio data, and the output is the recorded audio data and notification message.

[0463] (Application Example 1)

[0464] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0465] There is a need to improve the efficiency of customer service in physical stores, accurately understand customer emotions, and enable prompt responses. Furthermore, service improvement and compliance are also important, but there is no technology available to comprehensively support these aspects.

[0466] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0467] In this invention, the server includes a device for receiving voice interactions with customers in real time, a device for analyzing the received voice information to determine emotions, and a device for converting the voice information into text information. This makes it possible to quickly grasp the emotional state of customers and respond appropriately in real time. Furthermore, by generating and sending notifications according to the detected emotions and codes, it becomes possible to quickly share information with administrators. In addition, positive feedback can be saved and used to improve services, and inappropriate expressions can be automatically detected and recorded to maintain compliance.

[0468] "Voice interaction with customers" refers to the exchange of information between customers and store employees via voice within a store.

[0469] A "real-time receiving device" is an electronic device that instantly acquires and processes audio data.

[0470] "Audio information" refers to data in the form of electrically recorded or transmitted audio.

[0471] A "device that analyzes and determines emotions" is a processing device that identifies a customer's emotional state from voice information.

[0472] A "device that converts to text information" is a device that uses speech recognition technology to convert audio data into text data.

[0473] "Specific symbols" refer to important keywords or expressions that appear in the customer's statements.

[0474] A "device that generates and sends notifications" is a system that creates alerts and information based on detected data and sends them to relevant parties.

[0475] "Positive feedback" refers to expressions that indicate favorable feedback or praise from customers.

[0476] "Inappropriate language" refers to statements or expressions that do not comply with the standards set by the company and are deemed inappropriate.

[0477] A "device for recording related conversation content" is a device that saves audio data before and after a specific event when certain conditions are met.

[0478] This invention is a system that efficiently analyzes voice interactions with customers in physical stores and grasps their emotional state in real time. The system is configured as follows:

[0479] The server primarily receives and analyzes voice conversations in real time. The server is equipped with speech recognition software (e.g., Speech Recognition API) and emotion recognition modules, enabling it to instantly convert voice information into text and determine customer emotions. The analyzed voice information is stored in a database, and notifications are sent to the administrator as needed.

[0480] Furthermore, this system supports compliance by recognizing speech containing specific codes or inappropriate expressions and recording the relevant conversation content. Positive feedback is extracted separately and used to improve the service.

[0481] Furthermore, the application installed on each store terminal communicates with the server to notify store staff terminals of necessary information. This application utilizes a generative AI model through prompt messages. For example, if a customer expresses dissatisfaction with a new product in a store, the app immediately notifies the staff, enabling them to respond quickly.

[0482] An example of an input prompt for the generating AI model might be, "A customer seems dissatisfied with a new product. As a store employee, what kind of response can increase customer satisfaction?" This prompt helps store employees explore appropriate response methods.

[0483] In this way, the server, terminal, and user work together to create a system that enables efficient and flexible customer service.

[0484] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0485] Step 1:

[0486] The server receives audio data in real time. The audio information spoken by the customer is taken as input via the microphone, and the raw data is immediately sent to the server to prepare for subsequent processing.

[0487] Step 2:

[0488] The server converts the received audio information into text using a speech recognition module. In this step, data processing is performed, converting the audio data into text data using speech recognition software, which is then used as the output for the next step.

[0489] Step 3:

[0490] The server uses a voice analysis module to determine the emotional state from the text data. Here, data calculations are performed using emotion recognition technology, and the emotional state (e.g., anger, joy, sadness) is output as the analysis result. This information serves as an important indicator for customer service.

[0491] Step 4:

[0492] The server detects specific codes in the text data and performs processing based on those codes. For example, it extracts keywords related to complaints and generates data to produce real-time notifications when they are detected. As a result of this data processing on the input, notification instructions are output.

[0493] Step 5:

[0494] The server sends the generated notification to the store's terminal. This notification is displayed on the store staff's screen and serves as an output indicating what action is required in a given situation. Based on this notification, the user can then take appropriate action regarding the customer.

[0495] Step 6:

[0496] The device receives a notification, and the user takes appropriate action based on that information. This enables a rapid response that is in line with the customer's emotional state. Subsequently, the outcome of the interaction with the customer may be recorded as a prompt message for a generating AI model. This message is used as entry data to further improve the service based on the accumulated data.

[0497] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0498] Embodiments of the present invention will describe the combination of an emotion engine in a customer service support system using voice interaction with customers. This system first receives voice input from a terminal and transmits conversation data with the customer to a server in real time. The server passes the voice data to the emotion engine, which analyzes the emotional states of the customer and the user, respectively.

[0499] The server determines the customer's emotions based on their voice tone, speed, and intonation, and also recognizes the user's emotions from voice and operation data. Based on this analysis, if the user's emotions are positive, the conversation is automatically registered in the database as a success story to be used as a reference for service improvement. If the user's emotions are negative, a notification is sent to the administrator using a specified method, enabling a system to quickly provide support and corrective measures.

[0500] Furthermore, based on the customer sentiment analysis results, the server detects pre-configured keywords in the text data and performs specific actions. This includes, for example, sending alerts to relevant administrators to respond immediately to urgent requests or complaints from customers.

[0501] By using this system, users can accurately grasp customer needs in real time and respond based on their own emotional state. As a result, customer satisfaction improves and the overall efficiency of customer service operations is increased. For example, if a customer actively provides positive feedback in a store, that conversation is automatically stored and used as a success story for employee training. In this way, the present invention provides advanced customer service support functions that also take into account the user's emotions.

[0502] The following describes the processing flow.

[0503] Step 1:

[0504] The terminal records audio as soon as a conversation with a customer begins and sends the data to the server in real time.

[0505] Step 2:

[0506] The server passes the received audio data to the emotion engine. The emotion engine analyzes the customer's emotions from the audio and identifies them based on elements such as tone, speed, and intonation.

[0507] Step 3:

[0508] The server converts audio data into text data using a transcription module. It then uses the converted text to search for specific keywords and provides the results to other modules.

[0509] Step 4:

[0510] The user's emotions are also analyzed by the emotion engine, and their emotional state is identified based on their voice tone and the content of their conversation.

[0511] Step 5:

[0512] Based on the sentiment analysis results, the server sends a notification to relevant parties via the notification management system if the customer's or user's sentiment exceeds a certain threshold.

[0513] Step 6:

[0514] Users receive notifications from the server and take prompt action as needed. They consider and implement the most appropriate response based on the situation.

[0515] Step 7:

[0516] The server saves conversations containing positive feedback as success stories in its database. This information will be used later to improve service quality.

[0517] (Example 2)

[0518] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0519] Conventional customer service support systems struggled to accurately analyze customer emotions from voice interactions and provide appropriate support in real time. Furthermore, they lacked the means to effectively utilize positive feedback and respond quickly to negative situations, leaving challenges in improving customer satisfaction and operational efficiency.

[0520] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0521] In this invention, the server includes a device for receiving voice dialogues, a device for analyzing the received voice information to determine emotions, a device for converting voice information into text information, a device for automatically registering successful conversation examples in a case study collection based on the emotion analysis results, and a device for sending notifications to the administrator when negative emotions are detected. This makes it possible to accurately grasp emotions from conversations with customers, automatically take appropriate actions, and enable quick and effective customer service.

[0522] "Voice interaction" refers to communication using voice between a customer and a system.

[0523] "Voice information" refers to data obtained from voice dialogue, including the basic data that the server uses for analysis.

[0524] A "device for determining emotions" refers to a device that analyzes audio information to determine the emotional state of the speaker.

[0525] "Textual information" refers to information in text format generated by digitally processing audio information.

[0526] "Words" refer to words or phrases that have important meaning within written information and serve as triggers for sentiment analysis and actions.

[0527] A "notification generation and transmission device" is a device that creates and transmits notifications to relevant parties based on detected words or phrases.

[0528] "Positive feedback" refers to positive comments from customers and is used to improve services.

[0529] A "device for registering success stories in a case study collection" is a device for organizing and saving positive conversation content and accumulating it as reference material for the future.

[0530] "Negative emotions" refers to a state in which the speaker is showing dissatisfaction or negative emotions, as determined by emotion analysis.

[0531] A "device that sends notifications to administrators" is a device that automatically transmits information to administrators when negative emotions are detected, so that appropriate action can be taken.

[0532] This invention analyzes information obtained from voice interactions with customers in real time to provide advanced support for customer service. The system as a whole consists of a terminal, a server, and a user.

[0533] The terminal includes a device for capturing voice interactions with customers and acquiring high-quality audio information. This device uses a microphone with noise-canceling technology to maintain clarity of conversation. The acquired audio information is then transmitted to the server via encrypted communication protocols such as TLS / SSL.

[0534] The server plays a central role in processing the received audio information. The audio information is first passed to the sentiment analysis engine, where it is analyzed using a generative AI model on the server. This analysis evaluates the customer's and user's emotions. Once the emotions are determined, the server extracts positive feedback based on that and automatically registers it in the database as a success story.

[0535] Furthermore, the server detects specific words and phrases from the converted text information. When specific words or phrases are detected, the relevant actions are taken. For example, if the word "urgent" is identified, an alert can be sent to the responsible administrator to prompt immediate action. The server can also send a notification to the administrator if negative emotions are detected, enabling prompt support.

[0536] This system allows users to accurately understand customer needs in real time and provide appropriate support. As a result, it is expected to improve the overall efficiency of customer service operations and enhance customer satisfaction.

[0537] For example, if a store terminal records a positive comment from a customer such as "I am very satisfied," that data is organized on the server as a success story and used as material for employee training. On the other hand, if a complaint or grievance occurs, the server can send a notification to the administrator so that corrective action can be taken promptly.

[0538] An example of a prompt message would be, "What emotions can be inferred from this customer's statement?" This allows the system to perform a rapid and sophisticated analysis and provide an appropriate response.

[0539] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0540] Step 1:

[0541] The terminal acquires voice interaction with the customer. As input, the customer's spoken voice data is captured by the terminal's microphone. Specifically, the terminal uses noise cancellation technology to ensure clear voice data. As output, the voice data is encrypted and sent to the server for further processing.

[0542] Step 2:

[0543] The server receives audio data transmitted from the terminal and analyzes it using a generative AI model. The input is encrypted audio data from the terminal. Specifically, the AI ​​model analyzes the voice tone, speed, and intonation to determine the emotional state. The output is the analyzed emotional data.

[0544] Step 3:

[0545] The server automatically registers positive evaluations in the database based on sentiment data. The input consists of data that has been determined to be positive through sentiment analysis. Specifically, the server uses data management software to save the conversation as a successful example. The output is the updated database entry.

[0546] Step 4:

[0547] Based on the sentiment analysis results, the server sends a notification to the administrator if it detects negative emotions. The input is the emotion data that has been determined to be negative. Specifically, the server uses an email or messaging system to send an alert to the administrator in real time. The output is the notification sent to the administrator.

[0548] Step 5:

[0549] The server performs text conversion of audio information and detects specific keywords. The input is text data generated from audio data. In the specific data processing, a text analysis algorithm is used to identify the configured keywords. The output consists of the detected keywords and associated instructions.

[0550] Step 6:

[0551] The server performs a specific action based on the detected keyword. The input is the keyword detected in step 5. Specifically, if the keyword is "urgent," the server immediately sends an alert to the relevant department. The output is a record that the action was taken and any necessary follow-up.

[0552] (Application Example 2)

[0553] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0554] In customer service, it is essential to accurately understand customer emotions in real time and respond quickly and appropriately. Furthermore, there is a challenge in efficiently collecting positive customer feedback and utilizing it as training material within the organization.

[0555] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0556] In this invention, the server includes means for receiving voice interactions with customers in real time, means for analyzing the received voice data to determine emotions, and means for utilizing the extracted feedback as educational material. This enables appropriate responses tailored to customer emotions and the creation of effective training materials.

[0557] "Voice interaction with customers" refers to voice communication that takes place between a customer and a system or operator.

[0558] "Real-time reception" refers to a function that can instantly acquire and process audio data at the moment a conversation takes place.

[0559] "Methods for analyzing voice data to determine emotions" refer to technologies that analyze elements such as tone, speed, and intonation of voice to identify the speaker's emotional state.

[0560] "Methods for converting audio data into text data" refers to the process of converting audio information into text format using speech recognition technology.

[0561] "Methods for detecting specific keywords" refer to algorithms for identifying important words or phrases from converted character data.

[0562] A "means for generating and sending notifications" is a system that creates alerts or messages when specific events or conditions are met and sends them to the appropriate recipients.

[0563] "Means for extracting and saving positive feedback" refers to the process of identifying positive evaluations and opinions from customers and saving them in a database or file.

[0564] "Methods for utilizing as educational materials" refers to methods for using saved feedback as content for employee training and education.

[0565] To realize this invention, a system is needed that receives voice conversations with customers in real time and analyzes their content using an emotion analysis engine. When the server receives voice data, it processes the data using a dedicated API or software (e.g., a speech recognition engine or natural language processing tool) for emotion analysis. After the emotion analysis is complete, the voice data is converted into text data, and specific keywords that have been set in advance are detected. This allows the system to understand the customer's emotional state, and if positive feedback is obtained, it is stored in a database and used as training material within the organization.

[0566] The server also has the functionality to send notifications to administrators based on specific keywords or emotional states. This requires a network environment that enables real-time data communication and alert generation. Specific hardware includes smartphones and microphones as voice input terminals, and high-performance computers for processing the collected data. The software includes the aforementioned voice recognition engine, as well as applications that control the database management system and notification functions.

[0567] As a concrete example, consider a scenario involving customer service in a retail store. While a store employee is talking to a customer, the system detects the customer saying, "This is the best product I've ever used." The server detects this positive feedback, saves it to a database, and simultaneously processes it as training material for employees. Furthermore, managers can immediately learn about this success. An example of such a prompt would be, "We want to automatically record customer feedback on what they felt positively during in-store interactions and use it for employee training. Please provide specific examples of analysis."

[0568] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0569] Step 1:

[0570] The terminal captures conversations with customers as audio data in real time. The audio data is collected using a microphone and sent directly to the server. The input is the customer's raw voice, and the output is digital audio data.

[0571] Step 2:

[0572] The server sends the received audio data to the speech recognition engine, where it is converted into text data. The input is digital audio data, and the output is the corresponding text data. The audio waveform data is analyzed and converted into words and phrases through speech recognition.

[0573] Step 3:

[0574] The server passes the converted text data to the sentiment analysis engine to determine the customer's emotional state. The input is text data, and the output is the sentiment analysis result, i.e., the customer's emotional state (e.g., positive, negative). Natural language processing techniques are used to analyze emotional words and phrases in the text.

[0575] Step 4:

[0576] The server stores positive feedback in a database based on the sentiment analysis results. The input is the sentiment analysis results, and the output is the stored success story data. It automatically creates entries in the database and prepares the feedback for use as training material.

[0577] Step 5:

[0578] The server generates and sends a notification to the administrator if the detected emotion is negative and specific keywords are present. The input is the emotion analysis results and keyword detection results, and the output is the alert notification sent to the administrator. This notification is sent via email or as an alert message within the application using a notification system.

[0579] Step 6:

[0580] Users can access saved positive feedback to use as educational material. The input is a record of successful cases in the database, and the output is the available educational material. This material can be searched and displayed through the user interface.

[0581] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0582] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0583] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0584] [Fourth Embodiment]

[0585] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0586] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0587] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0588] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0589] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0590] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0591] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0592] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0593] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0594] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0595] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0596] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0597] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0598] As an embodiment of the present invention, a customer service support system using voice interaction with customers will be described. In this system, when voice interaction is initiated by a terminal installed in the store, voice data is immediately sent to a server. The server analyzes the received voice data in parallel using a voice emotion recognition module and a transcription module.

[0599] The server analyzes voice data to identify the customer's emotional state and sends a real-time notification to the administrator if negative emotions such as dissatisfaction or anger are detected, or if keywords related to a specific problem are found. This notification allows the user to take immediate action.

[0600] Meanwhile, the server simultaneously extracts expressions of gratitude and joy from the conversation and stores them in a database to store portions of the voice dialogue that include positive feedback and success stories. This allows users to use these examples to improve their services and responses.

[0601] Furthermore, this system also includes a compliance monitoring function. When inappropriate comments or expressions are detected in user-customer interactions, it automatically records the audio data before and after the incident and notifies the relevant departments as needed. This helps users ensure legal compliance and prevent damage to their company's reputation.

[0602] Thus, the present invention, as a voice analysis system that supports customer service operations, achieves both improved customer experience and increased efficiency in business processes. By using this system, users involved in customer service can quickly identify emotional fluctuations and compliance violations, and take appropriate action in a timely manner.

[0603] The following describes the processing flow.

[0604] Step 1:

[0605] The device starts recording audio immediately when a conversation with a customer begins. The recorded audio data is streamed to the server in real time.

[0606] Step 2:

[0607] The server passes the received audio data to the speech emotion recognition module. The module analyzes the audio data and identifies the customer's emotional state based on their voice tone, speed, and intonation.

[0608] Step 3:

[0609] The server simultaneously passes the audio data to a transcription module, which converts the dialogue into text data. This text data is then used for keyword detection and logging.

[0610] Step 4:

[0611] The server searches for specific keywords in the converted text, and if found, records and saves a portion of the relevant call. Depending on the detection criteria, it sends a notification to the relevant administrator if necessary.

[0612] Step 5:

[0613] Users receive notifications from the server and take prompt action based on those notifications if necessary. The system monitors the situation in real time and adapts services accordingly.

[0614] Step 6:

[0615] The server automatically extracts portions of voice interactions that contain positive feedback and saves them in the database as success stories. This information can be referenced later to help improve the service.

[0616] (Example 1)

[0617] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0618] In voice interactions with customers, it is essential to accurately grasp their emotions and needs in real time and respond quickly. Furthermore, recording and utilizing positive customer feedback is expected to improve services. On the other hand, monitoring inappropriate content and compliance violations to protect the company's reputation is also a crucial issue.

[0619] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0620] In this invention, the server includes means for acquiring voice conversations with customers in real time, means for analyzing the received voice information to recognize emotions, means for converting the acquired voice information into symbolic information, means for recognizing specific words or phrases based on the converted symbolic information, means for detecting inappropriate content and recording the voice information before and after it, and means for immediately providing notifications to the administrator based on the analysis results. This makes it possible to grasp customer emotions in real time and respond quickly, record positive feedback, monitor inappropriate remarks and protect corporate reputation.

[0621] "Voice interaction with customers" refers to communication through the voices of customers.

[0622] "Means of acquiring data in real time" refers to technical means for instantly collecting and processing audio data.

[0623] "Audio information" refers to data recorded as changes in sound.

[0624] A "means of recognizing emotions" refers to an algorithm or process for identifying emotional states from speech.

[0625] "Symbolic information" refers to data obtained by converting speech into symbols or characters.

[0626] "Specific terms" refer to important keywords or phrases that have been set in advance.

[0627] "Inappropriate content" refers to expressions or statements that may violate compliance.

[0628] "Analysis results" refer to information obtained through the processing of audio data.

[0629] "Means of providing notifications to administrators" refers to a function that uses analysis results to inform administrators of important information.

[0630] A "means of recording" refers to a function that saves specific data so that it can be referenced later.

[0631] The customer service support system proposed in this invention analyzes voice interactions with customers in real time and obtains various feedback. A terminal installed in the store collects customer voices and sends the voice data to a server, at which point the system begins processing. The server analyzes the received voice data using speech recognition technologies such as "NVIDIA Jarvis" or "Google Cloud Speech-to-Text" to identify the customer's emotions. In addition, the server converts the voice data into text and filters it for specific keywords.

[0632] If negative customer sentiment or specific issues are detected, the server promptly sends a notification to the administrator. This notification allows users to address the problem quickly. The server also records positive customer feedback and success stories, storing them in a database for service improvement. This provides users with valuable data to improve the quality of their customer service.

[0633] Furthermore, the server has a compliance monitoring function that records the audio data before and after any inappropriate expressions or content are detected and reports it to the management department. This allows users to help maintain legal compliance.

[0634] For example, if a customer complains that their coffee is cold, the server immediately analyzes the audio and sends a notification to the administrator. Conversely, if a customer gives positive feedback such as "the service was excellent," it is recorded in the database and used to improve customer service.

[0635] An example of a prompt is: "Describe the process of analyzing customer voices and using emotions and keywords to activate a real-time notification system."

[0636] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0637] Step 1:

[0638] The terminal collects customer voice input via a microphone. When customer speech is detected, it begins recording voice data and transmits it to the server in real time. The input here is the customer's raw voice, and the output is digital voice data transmitted to the server.

[0639] Step 2:

[0640] The server inputs the received audio data into a speech emotion recognition module. This module analyzes the audio waveform and evaluates the emotion using a generative AI model. Specifically, it extracts features such as tone, pitch, and speed from the speech and infers the emotional state based on these. The input here is digital audio data, and the output is identified emotion information.

[0641] Step 3:

[0642] The server simultaneously inputs the audio data into a transcription module, where it converts the audio into text data. This process uses speech recognition technology to analyze and transcribe the audio content. The input is audio data, and the output is the converted text data.

[0643] Step 4:

[0644] The server extracts specific keywords from the converted text data. This process scans the text and identifies relevant phrases based on a pre-configured keyword list. The input is text data, and the output is a list of detected keywords.

[0645] Step 5:

[0646] The server generates and sends necessary notifications to the administrator based on sentiment and keyword information. For example, if keywords related to negative sentiment or complaints are detected, this triggers an alert to be sent to the administrator. The inputs here are sentiment and keyword information, and the output is the generated notification message.

[0647] Step 6:

[0648] The server identifies text that expresses positive customer feedback and delight, and stores it in a database. This includes the automatic extraction of expressions of gratitude and praise. The input is text data, and the output is data representing recorded positive feedback.

[0649] Step 7:

[0650] The server automatically records the surrounding data if it detects inappropriate content and notifies the relevant department. This process is intended to maintain corporate compliance and operates according to pre-defined criteria. The input is audio data, and the output is the recorded audio data and notification message.

[0651] (Application Example 1)

[0652] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0653] There is a need to improve the efficiency of customer service in physical stores, accurately understand customer emotions, and enable prompt responses. Furthermore, service improvement and compliance are also important, but there is no technology available to comprehensively support these aspects.

[0654] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0655] In this invention, the server includes a device for receiving voice interactions with customers in real time, a device for analyzing the received voice information to determine emotions, and a device for converting the voice information into text information. This makes it possible to quickly grasp the emotional state of customers and respond appropriately in real time. Furthermore, by generating and sending notifications according to the detected emotions and codes, it becomes possible to quickly share information with administrators. In addition, positive feedback can be saved and used to improve services, and inappropriate expressions can be automatically detected and recorded to maintain compliance.

[0656] "Voice interaction with customers" refers to the exchange of information between customers and store employees via voice within a store.

[0657] A "real-time receiving device" is an electronic device that instantly acquires and processes audio data.

[0658] "Audio information" refers to data in the form of electrically recorded or transmitted audio.

[0659] A "device that analyzes and determines emotions" is a processing device that identifies a customer's emotional state from voice information.

[0660] A "device that converts to text information" is a device that uses speech recognition technology to convert audio data into text data.

[0661] "Specific symbols" refer to important keywords or expressions that appear in the customer's statements.

[0662] A "device that generates and sends notifications" is a system that creates alerts and information based on detected data and sends them to relevant parties.

[0663] "Positive feedback" refers to expressions that indicate favorable feedback or praise from customers.

[0664] "Inappropriate language" refers to statements or expressions that do not comply with the standards set by the company and are deemed inappropriate.

[0665] A "device for recording related conversation content" is a device that saves audio data before and after a specific event when certain conditions are met.

[0666] This invention is a system that efficiently analyzes voice interactions with customers in physical stores and grasps their emotional state in real time. The system is configured as follows:

[0667] The server primarily receives and analyzes voice conversations in real time. The server is equipped with speech recognition software (e.g., Speech Recognition API) and emotion recognition modules, enabling it to instantly convert voice information into text and determine customer emotions. The analyzed voice information is stored in a database, and notifications are sent to the administrator as needed.

[0668] Furthermore, this system supports compliance by recognizing speech containing specific codes or inappropriate expressions and recording the relevant conversation content. Positive feedback is extracted separately and used to improve the service.

[0669] Furthermore, the application installed on each store terminal communicates with the server to notify store staff terminals of necessary information. This application utilizes a generative AI model through prompt messages. For example, if a customer expresses dissatisfaction with a new product in a store, the app immediately notifies the staff, enabling them to respond quickly.

[0670] An example of an input prompt for the generating AI model might be, "A customer seems dissatisfied with a new product. As a store employee, what kind of response can increase customer satisfaction?" This prompt helps store employees explore appropriate response methods.

[0671] In this way, the server, terminal, and user work together to create a system that enables efficient and flexible customer service.

[0672] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0673] Step 1:

[0674] The server receives audio data in real time. The audio information spoken by the customer is taken as input via the microphone, and the raw data is immediately sent to the server to prepare for subsequent processing.

[0675] Step 2:

[0676] The server converts the received audio information into text using a speech recognition module. In this step, data processing is performed, converting the audio data into text data using speech recognition software, which is then used as the output for the next step.

[0677] Step 3:

[0678] The server uses a voice analysis module to determine the emotional state from the text data. Here, data calculations are performed using emotion recognition technology, and the emotional state (e.g., anger, joy, sadness) is output as the analysis result. This information serves as an important indicator for customer service.

[0679] Step 4:

[0680] The server detects specific codes in the text data and performs processing based on those codes. For example, it extracts keywords related to complaints and generates data to produce real-time notifications when they are detected. As a result of this data processing on the input, notification instructions are output.

[0681] Step 5:

[0682] The server sends the generated notification to the store's terminal. This notification is displayed on the store staff's screen and serves as an output indicating what action is required in a given situation. Based on this notification, the user can then take appropriate action regarding the customer.

[0683] Step 6:

[0684] The device receives a notification, and the user takes appropriate action based on that information. This enables a rapid response that is in line with the customer's emotional state. Subsequently, the outcome of the interaction with the customer may be recorded as a prompt message for a generating AI model. This message is used as entry data to further improve the service based on the accumulated data.

[0685] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0686] Embodiments of the present invention will describe the combination of an emotion engine in a customer service support system using voice interaction with customers. This system first receives voice input from a terminal and transmits conversation data with the customer to a server in real time. The server passes the voice data to the emotion engine, which analyzes the emotional states of the customer and the user, respectively.

[0687] The server determines the customer's emotions based on their voice tone, speed, and intonation, and also recognizes the user's emotions from voice and operation data. Based on this analysis, if the user's emotions are positive, the conversation is automatically registered in the database as a success story to be used as a reference for service improvement. If the user's emotions are negative, a notification is sent to the administrator using a specified method, enabling a system to quickly provide support and corrective measures.

[0688] Furthermore, based on the customer sentiment analysis results, the server detects pre-configured keywords in the text data and performs specific actions. This includes, for example, sending alerts to relevant administrators to respond immediately to urgent requests or complaints from customers.

[0689] By using this system, users can accurately grasp customer needs in real time and respond based on their own emotional state. As a result, customer satisfaction improves and the overall efficiency of customer service operations is increased. For example, if a customer actively provides positive feedback in a store, that conversation is automatically stored and used as a success story for employee training. In this way, the present invention provides advanced customer service support functions that also take into account the user's emotions.

[0690] The following describes the processing flow.

[0691] Step 1:

[0692] The terminal records audio as soon as a conversation with a customer begins and sends the data to the server in real time.

[0693] Step 2:

[0694] The server passes the received audio data to the emotion engine. The emotion engine analyzes the customer's emotions from the audio and identifies them based on elements such as tone, speed, and intonation.

[0695] Step 3:

[0696] The server converts audio data into text data using a transcription module. It then uses the converted text to search for specific keywords and provides the results to other modules.

[0697] Step 4:

[0698] The user's emotions are also analyzed by the emotion engine, and their emotional state is identified based on their voice tone and the content of their conversation.

[0699] Step 5:

[0700] Based on the sentiment analysis results, the server sends a notification to relevant parties via the notification management system if the customer's or user's sentiment exceeds a certain threshold.

[0701] Step 6:

[0702] Users receive notifications from the server and take prompt action as needed. They consider and implement the most appropriate response based on the situation.

[0703] Step 7:

[0704] The server saves conversations containing positive feedback as success stories in its database. This information will be used later to improve service quality.

[0705] (Example 2)

[0706] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0707] Conventional customer service support systems struggled to accurately analyze customer emotions from voice interactions and provide appropriate support in real time. Furthermore, they lacked the means to effectively utilize positive feedback and respond quickly to negative situations, leaving challenges in improving customer satisfaction and operational efficiency.

[0708] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0709] In this invention, the server includes a device for receiving voice dialogues, a device for analyzing the received voice information to determine emotions, a device for converting voice information into text information, a device for automatically registering successful conversation examples in a case study collection based on the emotion analysis results, and a device for sending notifications to the administrator when negative emotions are detected. This makes it possible to accurately grasp emotions from conversations with customers, automatically take appropriate actions, and enable quick and effective customer service.

[0710] "Voice interaction" refers to communication using voice between a customer and a system.

[0711] "Voice information" refers to data obtained from voice dialogue, including the basic data that the server uses for analysis.

[0712] A "device for determining emotions" refers to a device that analyzes audio information to determine the emotional state of the speaker.

[0713] "Textual information" refers to information in text format generated by digitally processing audio information.

[0714] "Words" refer to words or phrases that have important meaning within written information and serve as triggers for sentiment analysis and actions.

[0715] A "notification generation and transmission device" is a device that creates and transmits notifications to relevant parties based on detected words or phrases.

[0716] "Positive feedback" refers to positive comments from customers and is used to improve services.

[0717] A "device for registering success stories in a case study collection" is a device for organizing and saving positive conversation content and accumulating it as reference material for the future.

[0718] "Negative emotions" refers to a state in which the speaker is showing dissatisfaction or negative emotions, as determined by emotion analysis.

[0719] A "device that sends notifications to administrators" is a device that automatically transmits information to administrators when negative emotions are detected, so that appropriate action can be taken.

[0720] This invention analyzes information obtained from voice interactions with customers in real time to provide advanced support for customer service. The system as a whole consists of a terminal, a server, and a user.

[0721] The terminal includes a device for capturing voice interactions with customers and acquiring high-quality audio information. This device uses a microphone with noise-canceling technology to maintain clarity of conversation. The acquired audio information is then transmitted to the server via encrypted communication protocols such as TLS / SSL.

[0722] The server plays a central role in processing the received audio information. The audio information is first passed to the sentiment analysis engine, where it is analyzed using a generative AI model on the server. This analysis evaluates the customer's and user's emotions. Once the emotions are determined, the server extracts positive feedback based on that and automatically registers it in the database as a success story.

[0723] Furthermore, the server detects specific words and phrases from the converted text information. When specific words or phrases are detected, the relevant actions are taken. For example, if the word "urgent" is identified, an alert can be sent to the responsible administrator to prompt immediate action. The server can also send a notification to the administrator if negative emotions are detected, enabling prompt support.

[0724] This system allows users to accurately understand customer needs in real time and provide appropriate support. As a result, it is expected to improve the overall efficiency of customer service operations and enhance customer satisfaction.

[0725] For example, if a store terminal records a positive comment from a customer such as "I am very satisfied," that data is organized on the server as a success story and used as material for employee training. On the other hand, if a complaint or grievance occurs, the server can send a notification to the administrator so that corrective action can be taken promptly.

[0726] An example of a prompt message would be, "What emotions can be inferred from this customer's statement?" This allows the system to perform a rapid and sophisticated analysis and provide an appropriate response.

[0727] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0728] Step 1:

[0729] The terminal acquires voice interaction with the customer. As input, the customer's spoken voice data is captured by the terminal's microphone. Specifically, the terminal uses noise cancellation technology to ensure clear voice data. As output, the voice data is encrypted and sent to the server for further processing.

[0730] Step 2:

[0731] The server receives audio data transmitted from the terminal and analyzes it using a generative AI model. The input is encrypted audio data from the terminal. Specifically, the AI ​​model analyzes the voice tone, speed, and intonation to determine the emotional state. The output is the analyzed emotional data.

[0732] Step 3:

[0733] The server automatically registers positive evaluations in the database based on sentiment data. The input consists of data that has been determined to be positive through sentiment analysis. Specifically, the server uses data management software to save the conversation as a successful example. The output is the updated database entry.

[0734] Step 4:

[0735] Based on the sentiment analysis results, the server sends a notification to the administrator if it detects negative emotions. The input is the emotion data that has been determined to be negative. Specifically, the server uses an email or messaging system to send an alert to the administrator in real time. The output is the notification sent to the administrator.

[0736] Step 5:

[0737] The server performs text conversion of audio information and detects specific keywords. The input is text data generated from audio data. In the specific data processing, a text analysis algorithm is used to identify the configured keywords. The output consists of the detected keywords and associated instructions.

[0738] Step 6:

[0739] The server performs a specific action based on the detected keyword. The input is the keyword detected in step 5. Specifically, if the keyword is "urgent," the server immediately sends an alert to the relevant department. The output is a record that the action was taken and any necessary follow-up.

[0740] (Application Example 2)

[0741] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0742] In customer service, it is essential to accurately understand customer emotions in real time and respond quickly and appropriately. Furthermore, there is a challenge in efficiently collecting positive customer feedback and utilizing it as training material within the organization.

[0743] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0744] In this invention, the server includes means for receiving voice interactions with customers in real time, means for analyzing the received voice data to determine emotions, and means for utilizing the extracted feedback as educational material. This enables appropriate responses tailored to customer emotions and the creation of effective training materials.

[0745] "Voice interaction with customers" refers to voice communication that takes place between a customer and a system or operator.

[0746] "Real-time reception" refers to a function that can instantly acquire and process audio data at the moment a conversation takes place.

[0747] "Methods for analyzing voice data to determine emotions" refer to technologies that analyze elements such as tone, speed, and intonation of voice to identify the speaker's emotional state.

[0748] "Methods for converting audio data into text data" refers to the process of converting audio information into text format using speech recognition technology.

[0749] "Methods for detecting specific keywords" refer to algorithms for identifying important words or phrases from converted character data.

[0750] A "means for generating and sending notifications" is a system that creates alerts or messages when specific events or conditions are met and sends them to the appropriate recipients.

[0751] "Means for extracting and saving positive feedback" refers to the process of identifying positive evaluations and opinions from customers and saving them in a database or file.

[0752] "Methods for utilizing as educational materials" refers to methods for using saved feedback as content for employee training and education.

[0753] To realize this invention, a system is needed that receives voice conversations with customers in real time and analyzes their content using an emotion analysis engine. When the server receives voice data, it processes the data using a dedicated API or software (e.g., a speech recognition engine or natural language processing tool) for emotion analysis. After the emotion analysis is complete, the voice data is converted into text data, and specific keywords that have been set in advance are detected. This allows the system to understand the customer's emotional state, and if positive feedback is obtained, it is stored in a database and used as training material within the organization.

[0754] The server also has the functionality to send notifications to administrators based on specific keywords or emotional states. This requires a network environment that enables real-time data communication and alert generation. Specific hardware includes smartphones and microphones as voice input terminals, and high-performance computers for processing the collected data. The software includes the aforementioned voice recognition engine, as well as applications that control the database management system and notification functions.

[0755] As a concrete example, consider a scenario involving customer service in a retail store. While a store employee is talking to a customer, the system detects the customer saying, "This is the best product I've ever used." The server detects this positive feedback, saves it to a database, and simultaneously processes it as training material for employees. Furthermore, managers can immediately learn about this success. An example of such a prompt would be, "We want to automatically record customer feedback on what they felt positively during in-store interactions and use it for employee training. Please provide specific examples of analysis."

[0756] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0757] Step 1:

[0758] The terminal captures conversations with customers as audio data in real time. The audio data is collected using a microphone and sent directly to the server. The input is the customer's raw voice, and the output is digital audio data.

[0759] Step 2:

[0760] The server sends the received audio data to the speech recognition engine, where it is converted into text data. The input is digital audio data, and the output is the corresponding text data. The audio waveform data is analyzed and converted into words and phrases through speech recognition.

[0761] Step 3:

[0762] The server passes the converted text data to the sentiment analysis engine to determine the customer's emotional state. The input is text data, and the output is the sentiment analysis result, i.e., the customer's emotional state (e.g., positive, negative). Natural language processing techniques are used to analyze emotional words and phrases in the text.

[0763] Step 4:

[0764] The server stores positive feedback in a database based on the sentiment analysis results. The input is the sentiment analysis results, and the output is the stored success story data. It automatically creates entries in the database and prepares the feedback for use as training material.

[0765] Step 5:

[0766] The server generates and sends a notification to the administrator if the detected emotion is negative and specific keywords are present. The input is the emotion analysis results and keyword detection results, and the output is the alert notification sent to the administrator. This notification is sent via email or as an alert message within the application using a notification system.

[0767] Step 6:

[0768] Users can access saved positive feedback to use as educational material. The input is a record of successful cases in the database, and the output is the available educational material. This material can be searched and displayed through the user interface.

[0769] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0770] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0771] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0772] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0773] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0774] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0775] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0776] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0777] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0778] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0779] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0780] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0781] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0782] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0783] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0784] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0785] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0786] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0787] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0788] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0789] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0790] The following is further disclosed regarding the embodiments described above.

[0791] (Claim 1)

[0792] A means of receiving voice conversations with customers in real time,

[0793] A means of analyzing received audio data to determine emotions,

[0794] A means of converting audio data into text data,

[0795] A means for detecting specific keywords based on converted character data,

[0796] A means for generating and sending notifications in response to detected keywords,

[0797] A means of extracting and saving positive feedback,

[0798] A system that includes this.

[0799] (Claim 2)

[0800] The system according to claim 1, which sends notifications in real time based on emotion analysis of voice data.

[0801] (Claim 3)

[0802] The system according to claim 1, which stores related audio data upon detection of a specific keyword.

[0803] "Example 1"

[0804] (Claim 1)

[0805] A means of acquiring voice conversations with customers in real time,

[0806] A means of analyzing acquired audio information to recognize emotions,

[0807] A means of converting audio information into symbolic information,

[0808] A means for recognizing specific words or phrases based on converted symbolic information,

[0809] A means of creating and sending notifications in response to recognized words and phrases,

[0810] A means of extracting and recording positive feedback,

[0811] A means for detecting inappropriate content and recording audio information before and after it,

[0812] A means of immediately notifying the administrator based on the analysis results,

[0813] A system that includes this.

[0814] (Claim 2)

[0815] The system according to claim 1, which immediately issues a notification based on the emotion recognition result of voice information.

[0816] (Claim 3)

[0817] The system according to claim 1, which records related audio information by recognizing specific words or phrases.

[0818] "Application Example 1"

[0819] (Claim 1)

[0820] A device that receives voice conversations with customers in real time,

[0821] A device that analyzes received audio information to determine emotions,

[0822] A device that converts audio information into text information,

[0823] A device for detecting a specific code based on converted character information,

[0824] A device that generates and transmits a notification according to the detected code,

[0825] A device for extracting and saving positive opinions,

[0826] A device that detects inappropriate language and records related conversation content,

[0827] A system that includes this.

[0828] (Claim 2)

[0829] The system according to claim 1, which sends notifications in real time based on emotion analysis of voice information.

[0830] (Claim 3)

[0831] The system according to claim 1, which stores related audio information by detecting a specific code and sends a notification to the management department.

[0832] "Example 2 of combining an emotion engine"

[0833] (Claim 1)

[0834] A device for receiving voice dialogue,

[0835] A device that analyzes received audio information to determine emotions,

[0836] A device that converts audio information into text information,

[0837] A device for detecting specific words or phrases based on converted character information,

[0838] A device that generates and sends notifications in response to detected words,

[0839] A device for extracting and recording positive evaluations,

[0840] A device that automatically registers successful conversation examples in a case study collection based on emotion analysis results,

[0841] A device that sends a notification to the administrator when it detects negative emotions,

[0842] A system that includes this.

[0843] (Claim 2)

[0844] The system according to claim 1, which sends notifications in real time based on emotion analysis of voice information and further performs analysis using a generative AI model.

[0845] (Claim 3)

[0846] The system according to claim 1, which stores related audio information by detecting specific words or phrases and performs specific actions based on pre-set keywords.

[0847] "Application example 2 when combining with an emotional engine"

[0848] (Claim 1)

[0849] A means of receiving voice conversations with customers in real time,

[0850] A means of analyzing received audio data to determine emotions,

[0851] A means of converting audio data into text data,

[0852] A means for detecting specific keywords based on converted character data,

[0853] A means for generating and sending notifications in response to detected keywords,

[0854] A means of extracting and saving positive feedback,

[0855] Methods for utilizing the extracted feedback as educational material,

[0856] A system that includes this.

[0857] (Claim 2)

[0858] The system according to claim 1, which sends notifications in real time based on emotion analysis of voice data.

[0859] (Claim 3)

[0860] The system according to claim 1, which stores related audio data by detecting specific keywords and uses that data as training material. [Explanation of Symbols]

[0861] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of receiving voice conversations with customers in real time, A means of analyzing received audio data to determine emotions, A means of converting audio data into text data, A means for detecting specific keywords based on converted character data, A means for generating and sending notifications in response to detected keywords, A means of extracting and saving positive feedback, A system that includes this.

2. The system according to claim 1, which sends notifications in real time based on emotion analysis of voice data.

3. The system according to claim 1, which stores related audio data upon detection of a specific keyword.