system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A natural language processing system addresses inefficiencies in retail customer service by extracting keywords, incorporating real-time data, and generating optimal responses, enhancing service quality and speed.

JP2026096568APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

AI Technical Summary

Technical Problem

Conventional retail store customer service systems face inefficiencies due to the heavy burden on employees needing to grasp vast product information, variations in response quality, and the time-consuming process of obtaining accurate answers, which affects customer satisfaction.

Method used

A system utilizing natural language processing to extract keywords from customer inquiries, search past cases, incorporate real-time environmental data, and generate optimal responses, with visual support and automatic feedback management.

Benefits of technology

Dramatically improves the quality and speed of customer service, reducing employee burden and enhancing overall customer satisfaction by providing quick and accurate answers.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096568000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] A method for analyzing input natural language data and extracting keywords, A means of searching for past information and similar cases from a database, A means of acquiring external environmental information in real time and reflecting it in the analysis results, A means of generating the optimal answer or advice and displaying it on the device, A means of reporting unresolved questions to the relevant departments and managing feedback, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology disclosed herein relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In the customer service of conventional retail stores, employees need to grasp a vast amount of product information and service contents, and the burden is heavy. Also, since it depends on the experience of each employee, there may be variations in the quality of the response. Furthermore, in order to obtain a quick and accurate answer to a customer's question, it is necessary to contact a supervisor or other departments, which may take time. Therefore, technical means for solving these problems, improving the efficiency of employees, and enhancing customer satisfaction have been demanded. [[ID=三十六]]

Means for Solving the Problems

[0005] This invention provides a system that uses natural language processing technology to extract keywords from customer inquiries entered via voice or text, and quickly searches for past cases and similar information. Furthermore, it acquires external environmental information in real time and incorporates it into the analysis results to generate optimal answers and advice. This generated information is displayed on a terminal to visually support employees. In addition, unresolved questions are automatically reported to the relevant departments, and the system includes a function to manage feedback within the system, enabling rapid response and information accumulation.

[0006] "Natural language data" refers to data that is a computer-readable format of the language that humans use on a daily basis.

[0007] "Keywords" are words or phrases extracted from text that contain important information.

[0008] A "database" is a collection of information that organizes large amounts of data so that it can be efficiently searched and managed.

[0009] "External environmental information" refers to real-time data obtained from outside the system, such as date and time, weather, and location.

[0010] A "terminal" is a part of a computer system that a user can directly operate.

[0011] "Feedback" refers to information based on results and opinions provided to a system or department.

[0012] "Natural language processing" is a technology that enables computers to understand, analyze, and generate human language. [Brief explanation of the drawing]

[0013] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2]It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Embodiments for Carrying Out the Invention

[0014] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0019] In the following embodiments, the numbered communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] This invention is an AI system that uses speech processing and text analysis to efficiently support customer service in retail stores and service industries. The basic configuration consists of a speech input module, a natural language processing module, a database access module, an external data acquisition module, and a user interface module.

[0035] To begin operating the system, users input customer questions into the system as voice or text. In the case of voice input, the server uses speech recognition technology to convert it into text data. This speech recognition technology is trained using a large amount of voice data, achieving highly accurate text conversion.

[0036] The server passes the converted text data to a natural language processing module for semantic analysis. Specifically, it extracts relevant keywords, understands the context, and infers the customer's intent. In this process, it compares the data with similar customer inquiries and FAQ information previously recorded in the system.

[0037] The analyzed information is then retrieved by the server from a database to find relevant case studies and product information. This database includes past customer service history, product manuals, and campaign details. The server also obtains real-time data such as date, time, weather, and location information through an external data acquisition module, which is used to generate final advice.

[0038] The terminal aggregates this information and presents the user with the most appropriate response via a user interface module. This display is visually clear and well-organized, and includes relevant images and graphs as needed. For example, when explaining product information during a campaign, product photos and sales performance graphs are displayed to help the user respond effectively to customers.

[0039] If the system cannot immediately provide an appropriate answer to a customer's question, the server automatically forwards the unresolved inquiry to the relevant department and incorporates the received feedback into the database. This continuously improves the accuracy and usefulness of the system.

[0040] In this way, it becomes possible to dramatically improve the quality and speed of customer service, reduce the burden on employees, and strengthen the overall customer service capabilities of the company.

[0041] The following describes the processing flow.

[0042] Step 1:

[0043] The user enters customer questions in either voice or text format. In the case of voice input, the voice data is sent to the device via the microphone.

[0044] Step 2:

[0045] The server receives audio data and converts it to text through a speech recognition module. During this process, it analyzes the characteristics of the audio and removes unnecessary noise to ensure accurate text conversion.

[0046] Step 3:

[0047] The server sends the converted text data to a natural language processing module for grammatical analysis. It extracts keywords and performs semantic analysis to understand the content of the text.

[0048] Step 4:

[0049] Based on the analysis, the server searches for past cases and related information through the database access module. This includes FAQs, past customer support history, and product information.

[0050] Step 5:

[0051] The server uses an external data acquisition module to obtain real-time information such as date, time, weather, and location. This allows it to gather complementary information to make the advice it provides more realistic and accurate.

[0052] Step 6:

[0053] The server integrates historical data with external information to generate optimal advice or answers. In doing so, it uses AI algorithms to evaluate multiple options and derive the best solution.

[0054] Step 7:

[0055] The terminal displays information provided by the server to the user via a user interface module. The displayed information includes text, images, graphs, etc., and is organized in a visually clear and easy-to-understand manner.

[0056] Step 8:

[0057] If a suitable answer cannot be found, the server automatically forwards the unresolved question to the relevant department and awaits feedback. This feedback is later registered in the system's database and updated to help handle future inquiries.

[0058] (Example 1)

[0059] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0060] In modern society, improving the speed and accuracy of customer service in retail stores and service industries is a crucial challenge. However, conventional systems have insufficient processing capabilities for voice and text data, making it difficult to acquire and incorporate external data in real time. Furthermore, they struggle to understand complex customer intentions and generate optimal responses. There is a need to solve these problems and improve customer satisfaction.

[0061] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0062] In this invention, the server includes means for analyzing input natural language information to extract key words, means for searching past information and similar cases from the information database, and means for acquiring current external environmental information and reflecting it in the analysis results. This makes it possible to provide quick and accurate answers to customer questions.

[0063] "Natural language information" refers to information written in the language that humans use in everyday life, and is the data format before it is converted into a format that can be processed by machines.

[0064] "Key words" are important keywords extracted from natural language information and are elements used for understanding and searching the content of text.

[0065] "Information aggregation" refers to databases or knowledge bases where past cases and related information are stored, and is a collection of information that can be searched.

[0066] "External environmental information" refers to data such as date and time, weather, and regional information acquired in real time from outside the system, and is a factor that may affect the analysis results.

[0067] "Visual information" refers to data presented to users in a visually recognizable format, such as images or graphs.

[0068] "Statistical representation" refers to information provided in the form of graphs, charts, and other visual aids used to present data in an easily understandable way.

[0069] "Recognizing intent" means analyzing the context and content of the input information to understand the speaker's purpose and requests.

[0070] This system is an AI system that uses speech processing and text analysis to efficiently support customer service in retail stores and service industries. The basic configuration of the system consists of a speech input module, a natural language processing module, a database access module, an external data acquisition module, and a user interface module.

[0071] Voice input module:

[0072] Users input customer questions into the system as voice or text. In the case of voice input, the server uses speech recognition technology to convert it into text data. This speech recognition uses advanced technologies such as "speech recognition services." For example, if a customer asks a question by voice, "What is the warranty period for this product?", the server converts the voice into text with high accuracy.

[0073] Natural language processing module:

[0074] The server passes the converted text data to a natural language processing module for semantic analysis. Using a "natural language processing library," it extracts relevant keywords and understands the context to infer the customer's intent. This allows it to recognize that the customer is asking about the "warranty period."

[0075] Database access module:

[0076] Based on the analyzed information, the server searches the database for relevant product information and past case studies. A "database management system" is used here, enabling the rapid and accurate retrieval of relevant information.

[0077] External data acquisition module:

[0078] An external data acquisition module operates to obtain real-time external environmental information and reflect it in the analysis results. For example, it retrieves weather information and regional information from the "External Data API" to help generate final advice.

[0079] User interface module:

[0080] The terminal aggregates this information and presents it to the user in an easy-to-understand visual format through a user interface. For example, it displays product-related images and sales data graphs to help users effectively explain products to customers.

[0081] Example of a prompt:

[0082] "How can we improve the speed of our customer response?"

[0083] Through these modules, the system can respond to customers quickly and accurately, improving the company's customer service capabilities.

[0084] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0085] Step 1:

[0086] Users input customer questions into the system in either voice or text format. In the case of voice input, the server uses a speech recognition module to capture the voice data and convert it into text data. Specifically, the server analyzes the voice signal and uses a language model to convert it into a string. The input is sound waves, and the output is the corresponding text data.

[0087] Step 2:

[0088] The server passes the converted text data to a natural language processing module. Here, the server performs syntactic analysis of the text and extracts key words. Specifically, it identifies words and phrases in the text and analyzes their meanings. This process is performed using a morphological analyzer, with text data as input and analysis information including key words and intent as output.

[0089] Step 3:

[0090] The server performs database searches based on the analysis information. It uses a database access module to search for similar past cases and product information. Specifically, it uses SQL queries to perform cross-database searches and retrieve the necessary information. The input is the analysis information, and the output is a dataset containing related information.

[0091] Step 4:

[0092] The server acquires real-time external environmental information via an external data acquisition module and incorporates it into the analysis results. Specifically, it uses an API to obtain environment variables such as weather and supplements the analysis information. In this process, the input is the analysis information, and the output is the enhanced analysis information.

[0093] Step 5:

[0094] The terminal presents information to the user using a user interface module, based on enhanced analytical information from the server. Specifically, it uses a GUI library to visually organize data and display it on the terminal screen. The input is enhanced analytical information, and the output is the visual information presented to the user.

[0095] Step 6:

[0096] If immediate resolution is difficult, the server reports the unresolved inquiry to the relevant department and implements a specific process to gather subsequent feedback. Specifically, it automates the distribution of inquiries via email and ticketing systems, and facilitates the addition of feedback data to the database. In this process, the input is the unresolved inquiry, and the output is an updated database reflecting the feedback.

[0097] (Application Example 1)

[0098] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0099] Retail stores and service industries are facing a need to improve the efficiency and quality of customer service. Currently, it is difficult to respond quickly and appropriately to a wide range of customer inquiries, increasing the burden on store employees and potentially leading to decreased customer satisfaction. In particular, there is a demand for the ability to instantly analyze large amounts of information and derive the optimal answer.

[0100] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0101] In this invention, the server includes means for analyzing input natural language data to extract keywords, means for searching a database for past information and similar cases, means for acquiring external environmental information in real time and reflecting it in the analysis results, and means for acquiring location information within the store and providing inventory information and discount information. This makes it possible to quickly and appropriately present product and service information in response to customer inquiries.

[0102] "Natural language data" refers to strings of characters and audio information expressed in the language forms that humans use on a daily basis.

[0103] "Keywords" are important words or phrases extracted from natural language data and are used to identify and classify information.

[0104] A "database" is a system for systematically accumulating and managing data such as past information and similar cases.

[0105] "External environmental information" refers to data about external factors acquired in real time, such as date and time, weather, and regional information.

[0106] "Analysis results" refers to the information and conclusions obtained through the analysis of natural language data.

[0107] A "terminal" is a device used for inputting and outputting data, and includes smartphones and computers.

[0108] "Location information" refers to geographical data about a specific place, and is obtained using methods such as GPS.

[0109] "Inventory information" refers to the status of goods held in a store, and includes data on the quantity and types of products available for sale.

[0110] "Discount information" refers to information about discounts or special prices on specific products or services.

[0111] The system for implementing this invention mainly consists of a server and a terminal. The server analyzes the input natural language data and extracts keywords to understand the user's intent. If voice input is provided, speech recognition technology is used to convert the voice data into text data. In this case, the server uses a speech recognition API (e.g., Google® Speech Recognition API) that has been trained on a large amount of voice data.

[0112] Based on the analysis results, the server searches a database to find past information and similar cases. The server also acquires real-time external environmental information and incorporates it into the analysis results. This includes current data such as date, time, weather, and location, which can be used to provide information on in-store inventory and discounts.

[0113] The terminal has the functionality to visually display the best answers and advice obtained from the server to the user. In particular, the information is presented clearly using relevant images and graphs. This allows the user to respond to customers quickly and accurately.

[0114] As a concrete example, imagine a store employee using a smartphone to operate an application and entering a prompt such as, "What promotions are currently running for this product?" This allows the server to immediately analyze the request and provide information, effectively supporting customer service.

[0115] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0116] Step 1:

[0117] Users use their smartphones to input customer questions via voice or text. In the case of voice input, the smartphone's microphone is used to collect the customer's question. The input data is then sent to a server.

[0118] Step 2:

[0119] The server uses a speech recognition API to convert audio data into text data. The input is an audio file, and the output is a customer question in text format. During this process, the audio data is converted to text with high accuracy.

[0120] Step 3:

[0121] The server analyzes text data using a natural language processing module and extracts important keywords. Text data is the input, and the output is the key keywords. Here, a generative AI model performs contextual analysis and keyword recognition.

[0122] Step 4:

[0123] The server searches the database to find similar queries and related product information from the past. Keywords are used as input, and relevant information is obtained as output. As a result of the search, the relevant information is retrieved quickly.

[0124] Step 5:

[0125] The server uses an external data acquisition module to obtain real-time external environmental information (e.g., weather and inventory status). Inputs are the current date, time, and location, while output is external data. By using this information in the analysis results, more accurate answers and suggestions are generated.

[0126] Step 6:

[0127] The server generates the optimal response or advice based on the acquired data. At this point, the specific information requested by the user is comprehensively summarized. The input consists of past case information and external data, and the output is a specific response.

[0128] Step 7:

[0129] The terminal displays information received from the server via a user interface. Related images and graphs are displayed along with the optimal answer, providing the user with visually easy-to-understand information.

[0130] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0131] This invention is a customer service support system that utilizes speech and natural language processing, and further incorporates an emotion engine that recognizes user emotions. This system not only converts input speech into text and extracts keywords using natural language processing technology, but also has the function of analyzing user emotions in real time.

[0132] In system operation, users input customer inquiries and requests via voice or text. If voice input is selected, the server uses high-precision speech recognition technology to convert it into text data. The converted text is analyzed through a natural language processing module to extract relevant keywords and context.

[0133] At this stage, the server utilizes an emotion engine to evaluate the user's emotional state from voice intonation and text expression. For example, it can determine whether the customer is anxious or satisfied based on their tone of voice and word choice. The information obtained through emotion analysis is used as a crucial element in determining an appropriate response strategy.

[0134] Next, the server searches the database based on the user's input and emotional state, referencing past cases and related information. For example, it extracts the history of responses to similar questions in the past and effective countermeasures based on emotional changes. Furthermore, it acquires external information in real time and generates advice that includes the date, time, weather, and local conditions.

[0135] The generated information is presented visually to the user via the terminal's user interface module. For example, if a customer appears anxious, explanations and information that provide reassurance are highlighted. This improves the quality of customer service and enables the provision of services that are more attentive to the customer's emotions.

[0136] Furthermore, emotional data obtained during interactions is recorded by the server and stored in a database. This allows for the analysis and utilization of past emotional history in future customer interactions. In this way, the overall accuracy and effectiveness of the system improve over the long term.

[0137] The following describes the processing flow.

[0138] Step 1:

[0139] The user inputs customer questions into the system via voice or text. In the case of voice input, the voice data is sent to the terminal via the microphone.

[0140] Step 2:

[0141] The server processes the received audio data using a speech recognition module and converts it into text data. During this process, it analyzes the audio pattern and reduces background noise.

[0142] Step 3:

[0143] The server analyzes the transcribed data using a natural language processing module to extract keywords and contextual structure. This analysis allows for an accurate understanding of the customer's intent.

[0144] Step 4:

[0145] The server uses an emotion engine to identify the user's emotions from their tone of voice and selected words. In particular, it identifies information where the emotional state is a crucial factor in customer service.

[0146] Step 5:

[0147] The server searches the database based on the analyzed text and sentiment information to retrieve relevant past cases, FAQs, and product information. If the sentiment is pronounced, it also refers to past responses applied to similar emotional situations.

[0148] Step 6:

[0149] The server acquires real-time environmental information such as date, time, weather, and location through an external data acquisition module. This information is also taken into consideration to form optimal advice.

[0150] Step 7:

[0151] The terminal visually presents advice and information received from the server to the user through a user interface module. The displayed content consists of text, relevant images, graphs as needed, and other elements that take emotions into consideration.

[0152] Step 8:

[0153] The server records emotional data identified during the interaction and its changes, and stores it in a database for future ideal responses. This data will be used for future analysis and contribute to improving the overall response accuracy of the system.

[0154] (Example 2)

[0155] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0156] Traditional customer service systems could extract keywords from entered information and incorporate external data, but they had limitations in their ability to specifically analyze customer emotions and optimize responses. Furthermore, it was difficult to effectively utilize interaction history to improve the quality of subsequent interactions.

[0157] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0158] In this invention, the server includes means for analyzing input voice or natural language data to extract keywords, means for analyzing the user's emotions, and means for searching a database for past cases and related information. This makes it possible to present countermeasures that take customer emotions into consideration and to improve the quality of service by utilizing accumulated emotional data.

[0159] "Voice or natural language data" refers to information in the form of conversations or texts entered by users, including customer requests and questions.

[0160] "Methods for analyzing and extracting keywords" refers to techniques that analyze input data and identify and pinpoint important words and phrases from it.

[0161] "Methods for analyzing user emotions" refer to technologies that evaluate voice tone and expression in order to determine the emotions contained in conversations and written texts.

[0162] "Means of searching past cases and related information from a database" refers to a technology that searches for and retrieves cases and information that are relevant to the current situation from a pre-existing database of cases and information.

[0163] "Means of acquiring external information in real time and reflecting it in analysis results" refers to technologies that acquire the latest data from the internet and other information sources, and incorporate and utilize it in the results of the analysis.

[0164] "Means for generating optimal advice or suggestions and outputting them to a display device" refers to a technology that creates the most effective advice or solutions based on analyzed information and provides them to the user through a display device.

[0165] "A means of accumulating emotional data acquired during a conversation and reflecting it in future responses" refers to a technology that saves emotionally-based information obtained from interactions with customers and utilizes it for future interactions.

[0166] This invention is a system that utilizes speech and natural language processing technologies to support customer service. The system combines speech-to-text conversion, natural language analysis, and sentiment analysis to provide responses tailored to the customer's emotions. Specific embodiments are described below.

[0167] Users enter inquiries and requests via voice or text. If voice input is used, the information is transmitted to the system via the device's microphone. Voice input is available on readily available devices and is designed with ease of use in mind.

[0168] The server converts the audio into text data using high-precision speech recognition software (e.g., a common speech recognition API). During this process, it accurately captures the subtle nuances of the speech and replaces them with textual information.

[0169] The obtained text data is analyzed using a natural language processing library (e.g., spaCy or other common NLU libraries). Sentence segmentation, keyword extraction, and contextual analysis are performed, thereby extracting important information.

[0170] Next, the server uses an emotion analysis engine (e.g., a common emotion analysis tool) to evaluate the customer's emotions. This allows the server to identify the customer's emotional state, specifically emotions such as joy, sadness, or dissatisfaction.

[0171] Based on the analyzed keywords and sentiment information, the server searches the database to retrieve past cases and related information. Furthermore, it uses external information such as weather and date / time, obtained in real time, to generate more appropriate advice.

[0172] The generated information is presented to the user visually through the terminal's user interface. At this time, the display device uses effective visual materials to provide information in a way that is easily understandable to the customer.

[0173] For example, if a customer expresses dissatisfaction with a product malfunction, the system identifies their feelings and suggests solutions based on past repair cases. This enables a quick and accurate response.

[0174] An example of a prompt message is, "Please enter the customer's voice. The system will analyze their emotions and suggest the best course of action." This encourages users to utilize the system. This technology can significantly improve the quality of customer service and increase customer satisfaction.

[0175] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0176] Step 1:

[0177] Users input their inquiries via voice or text using a terminal. In the case of voice input, the microphone captures the audio and sends it to the system. The voice and text become input data for the system. The voice data is converted into a digital signal upon receipt, ready for processing.

[0178] Step 2:

[0179] The server converts the input audio data into text using speech recognition software. The input is audio data, and the output is text data. In this process, the prosody and pronunciation of the speech are analyzed as features, and highly accurate text conversion is performed based on that information.

[0180] Step 3:

[0181] The server analyzes the transcribed data using a natural language processing library, which is a natural language processing module. The input is text data, and the output is extracted keywords and contextual information. The process involves segmenting sentences, tagging parts of speech, and performing dependency analysis to identify keywords.

[0182] Step 4:

[0183] The server uses an emotion analysis engine to evaluate the user's emotions from text data. The input is analyzed text information, and the output is an emotion score. In this step, emotional features are calculated from the text representation, and emotions such as positive, negative, and neutral are quantified.

[0184] Step 5:

[0185] The server searches the database using keywords and sentiment scores. The input is the identified keywords and sentiment scores, and the output is similar cases and related information. This process involves issuing SQL queries to retrieve past interaction history and related solutions.

[0186] Step 6:

[0187] The server retrieves real-time data from external sources and integrates it into the search results. This includes utilizing APIs to obtain weather, time, and geographical information. The input is the initial search results, and the output is enhanced suggestion information.

[0188] Step 7:

[0189] The terminal visually presents the generated suggestion information to the user through a user interface. The input is integrated suggestion information, and the output is the final display to the user. The terminal provides information visually using infographics and text.

[0190] Step 8:

[0191] The server records the sentiment data and search information obtained during processing and stores it in a database. The input is all the analysis information, and the output is the data accumulated for future use. This data is used to improve the quality of responses in subsequent interactions.

[0192] (Application Example 2)

[0193] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0194] In today's living environment, there is a demand for sophisticated services tailored to individual emotions and physical conditions. In particular, real-time emotional recognition and feedback are essential to improving the quality of communication within the home and ensuring smooth daily living support. However, conventional home support systems currently struggle to accurately interpret users' emotional states, making it difficult to provide individualized support based on these perceptions.

[0195] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0196] In this invention, the server includes means for analyzing input natural language data and extracting keywords, means for evaluating the user's emotional state from speech or text using an emotion analysis engine, and means for playing appropriate sound or visual content based on the user's emotional state. This enables customized responses and service provision tailored to the individual's emotional state.

[0197] "Natural language data" refers to information expressed in the language format that humans use on a daily basis.

[0198] A "data bank" is an information recording device used to store past information and records of similar cases.

[0199] "External environmental information" refers to information about conditions such as weather and time that exist outside the system.

[0200] An "information display device" is a device used to visually present information to a user.

[0201] "Relevant departments" refer to organizations that possess the necessary knowledge and functions to operate the system and address unresolved questions.

[0202] An "emotion analysis engine" is an analytical tool used to evaluate emotional states from speech and text.

[0203] "User's emotional state" refers to the user's current psychological or emotional condition.

[0204] "Audio and visual content" refers to forms of information or entertainment that utilize sound and images.

[0205] This invention applies a customer service support system utilizing voice and natural language processing to a consumer robot used in the home. It primarily uses an emotion recognition engine and real-time database referencing capabilities to provide audio and visual content based on an individual's emotional state. A specific implementation example of the system is shown below.

[0206] The server converts the input speech into text data using high-precision recognition technology. The hardware used here is a Raspberry Pi, and the software utilizes the Google Cloud Speech-to-Text API. The converted text data is then analyzed using natural language processing tools such as NLTK and SpaCy to extract keywords.

[0207] Next, the server uses IBM Watson® Tone Analyzer, an emotion analysis engine, to evaluate the user's emotions from their voice tone and text. This makes it possible to determine the user's emotional state.

[0208] Based on the emotional state, the device selects and plays appropriate audio and visual content. Available media include music, videos, and animation.

[0209] When a user makes an inquiry or request by voice, they can say something like, "I'm feeling tired today and would like to relax a bit," and the system will select and play relaxing music.

[0210] Also, an example of a prompt statement is:

[0211] "Please advise on how to soothe users' minds when they want to relax."

[0212] "I would appreciate your advice on the best way to deal with a family member who is experiencing stress."

[0213] This allows for responses tailored to the user's emotional state, improving satisfaction and convenience within the home.

[0214] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0215] Step 1:

[0216] The user speaks to the home robot using their voice. The input here is the user's voice data. The server captures the voice using a microphone on a Raspberry Pi and converts that data to text via the Google Cloud Speech-to-Text API. This conversion process yields the text data.

[0217] Step 2:

[0218] The server analyzes the obtained text data using natural language processing tools such as NLTK and SpaCy. The input is the text data converted in step 1. This process extracts keywords and context, and a list of relevant keywords is output.

[0219] Step 3:

[0220] The server performs sentiment analysis on text data and voice tone using IBM Watson Tone Analyzer. The inputs are the voice tone acquired in step 1 and the text data from step 2. Based on the analysis, the user's emotional state is evaluated, and emotion tags such as anger, joy, and sadness are output.

[0221] Step 4:

[0222] The server, based on emotional state data, references a historical database to select the most appropriate audio or visual content. In this step, emotional tags and keyword lists are inputs. The most suitable content is selected from past history, and that information is output.

[0223] Step 5:

[0224] The device plays the selected audio or visual content. The content information obtained in step 4 is used as input. Using the speaker or display, music, animation, or video that matches the user's emotional state is presented to the user. This completes the provision of a service that is attentive to the user's emotions.

[0225] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0226] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0227] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0228] [Second Embodiment]

[0229] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0230] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0231] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0232] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0233] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0234] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0235] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0236] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0237] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0238] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0239] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0240] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0241] This invention is an AI system that uses speech processing and text analysis to efficiently support customer service in retail stores and service industries. The basic configuration consists of a speech input module, a natural language processing module, a database access module, an external data acquisition module, and a user interface module.

[0242] To begin operating the system, users input customer questions into the system as voice or text. In the case of voice input, the server uses speech recognition technology to convert it into text data. This speech recognition technology is trained using a large amount of voice data, achieving highly accurate text conversion.

[0243] The server passes the converted text data to a natural language processing module for semantic analysis. Specifically, it extracts relevant keywords, understands the context, and infers the customer's intent. In this process, it compares the data with similar customer inquiries and FAQ information previously recorded in the system.

[0244] The analyzed information is then retrieved by the server from a database to find relevant case studies and product information. This database includes past customer service history, product manuals, and campaign details. The server also obtains real-time data such as date, time, weather, and location information through an external data acquisition module, which is used to generate final advice.

[0245] The terminal aggregates this information and presents the user with the most appropriate response via a user interface module. This display is visually clear and well-organized, and includes relevant images and graphs as needed. For example, when explaining product information during a campaign, product photos and sales performance graphs are displayed to help the user respond effectively to customers.

[0246] If the system cannot immediately provide an appropriate answer to a customer's question, the server automatically forwards the unresolved inquiry to the relevant department and incorporates the received feedback into the database. This continuously improves the accuracy and usefulness of the system.

[0247] In this way, it becomes possible to dramatically improve the quality and speed of customer service, reduce the burden on employees, and strengthen the overall customer service capabilities of the company.

[0248] The following describes the processing flow.

[0249] Step 1:

[0250] The user enters customer questions in either voice or text format. In the case of voice input, the voice data is sent to the device via the microphone.

[0251] Step 2:

[0252] The server receives audio data and converts it to text through a speech recognition module. During this process, it analyzes the characteristics of the audio and removes unnecessary noise to ensure accurate text conversion.

[0253] Step 3:

[0254] The server sends the converted text data to a natural language processing module for grammatical analysis. It extracts keywords and performs semantic analysis to understand the content of the text.

[0255] Step 4:

[0256] Based on the analysis, the server searches for past cases and related information through the database access module. This includes FAQs, past customer support history, and product information.

[0257] Step 5:

[0258] The server uses an external data acquisition module to obtain real-time information such as date, time, weather, and location. This allows it to gather complementary information to make the advice it provides more realistic and accurate.

[0259] Step 6:

[0260] The server integrates historical data with external information to generate optimal advice or answers. In doing so, it uses AI algorithms to evaluate multiple options and derive the best solution.

[0261] Step 7:

[0262] The terminal displays information provided by the server to the user via a user interface module. The displayed information includes text, images, graphs, etc., and is organized in a visually clear and easy-to-understand manner.

[0263] Step 8:

[0264] If a suitable answer cannot be found, the server automatically forwards the unresolved question to the relevant department and awaits feedback. This feedback is later registered in the system's database and updated to help handle future inquiries.

[0265] (Example 1)

[0266] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0267] In modern society, improving the speed and accuracy of customer service in retail stores and service industries is a crucial challenge. However, conventional systems have insufficient processing capabilities for voice and text data, making it difficult to acquire and incorporate external data in real time. Furthermore, they struggle to understand complex customer intentions and generate optimal responses. There is a need to solve these problems and improve customer satisfaction.

[0268] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0269] In this invention, the server includes means for analyzing input natural language information to extract key words, means for searching past information and similar cases from the information database, and means for acquiring current external environmental information and reflecting it in the analysis results. This makes it possible to provide quick and accurate answers to customer questions.

[0270] "Natural language information" refers to information written in the language that humans use in everyday life, and is the data format before it is converted into a format that can be processed by machines.

[0271] "Key words" are important keywords extracted from natural language information and are elements used for understanding and searching the content of text.

[0272] "Information aggregation" refers to databases or knowledge bases where past cases and related information are stored, and is a collection of information that can be searched.

[0273] "External environmental information" refers to data such as date and time, weather, and regional information acquired in real time from outside the system, and is a factor that may affect the analysis results.

[0274] "Visual information" refers to data presented to users in a visually recognizable format, such as images or graphs.

[0275] "Statistical representation" refers to information provided in the form of graphs, charts, and other visual aids used to present data in an easily understandable way.

[0276] "Recognizing intent" means analyzing the context and content of the input information to understand the speaker's purpose and requests.

[0277] This system is an AI system that uses speech processing and text analysis to efficiently support customer service in retail stores and service industries. The basic configuration of the system consists of a speech input module, a natural language processing module, a database access module, an external data acquisition module, and a user interface module.

[0278] Voice input module:

[0279] The user inputs questions from customers into the system as voice or text. In the case of voice input, the server uses voice recognition technology to convert it into text data. Advanced technologies such as "voice recognition service" are used for this voice recognition. For example, when a customer asks "What is the warranty period of this product?" in voice, the server converts the voice into text with high precision.

[0280] Natural language processing module:

[0281] The server passes the converted text data to the natural language processing module for semantic analysis of the text. Related keywords are extracted using the "natural language processing library", and the intention of the customer is inferred by understanding the context. Thus, it can be recognized that the customer is asking about the "warranty period".

[0282] Database access module:

[0283] Based on the analyzed information, the server searches the database for related product information and past cases. Here, the "database management system" is used, and it is possible to obtain relevant information quickly and accurately.

[0284] External data acquisition module:

[0285] The external data acquisition module operates to obtain real-time external environmental information and reflect it in the analysis results. For example, weather information and regional information are obtained from the "external data API" and used for generating the final advice.

[0286] User interface module:

[0287] The terminal aggregates this information and visually presents it to the user in an easy-to-understand manner via the user interface. For example, it displays product-related images and graphs of sales data to support the user in effectively explaining to the customer.

[0288] Example of a prompt:

[0289] "How can we improve the speed of our customer response?"

[0290] Through these modules, the system can respond to customers quickly and accurately, improving the company's customer service capabilities.

[0291] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0292] Step 1:

[0293] Users input customer questions into the system in either voice or text format. In the case of voice input, the server uses a speech recognition module to capture the voice data and convert it into text data. Specifically, the server analyzes the voice signal and uses a language model to convert it into a string. The input is sound waves, and the output is the corresponding text data.

[0294] Step 2:

[0295] The server passes the converted text data to a natural language processing module. Here, the server performs syntactic analysis of the text and extracts key words. Specifically, it identifies words and phrases in the text and analyzes their meanings. This process is performed using a morphological analyzer, with text data as input and analysis information including key words and intent as output.

[0296] Step 3:

[0297] The server performs database searches based on the analysis information. It uses a database access module to search for similar past cases and product information. Specifically, it uses SQL queries to perform cross-database searches and retrieve the necessary information. The input is the analysis information, and the output is a dataset containing related information.

[0298] Step 4:

[0299] The server acquires real-time external environmental information via an external data acquisition module and incorporates it into the analysis results. Specifically, it uses an API to obtain environment variables such as weather and supplements the analysis information. In this process, the input is the analysis information, and the output is the enhanced analysis information.

[0300] Step 5:

[0301] The terminal presents information to the user using a user interface module, based on enhanced analytical information from the server. Specifically, it uses a GUI library to visually organize data and display it on the terminal screen. The input is enhanced analytical information, and the output is the visual information presented to the user.

[0302] Step 6:

[0303] If immediate resolution is difficult, the server reports the unresolved inquiry to the relevant department and implements a specific process to gather subsequent feedback. Specifically, it automates the distribution of inquiries via email and ticketing systems, and facilitates the addition of feedback data to the database. In this process, the input is the unresolved inquiry, and the output is an updated database reflecting the feedback.

[0304] (Application Example 1)

[0305] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0306] Retail stores and service industries are facing a need to improve the efficiency and quality of customer service. Currently, it is difficult to respond quickly and appropriately to a wide range of customer inquiries, increasing the burden on store employees and potentially leading to decreased customer satisfaction. In particular, there is a demand for the ability to instantly analyze large amounts of information and derive the optimal answer.

[0307] The specific processing by the specific processing unit 290 of the data processing apparatus 12 in Application Example 1 is realized by the following means.

[0308] In this invention, the server includes means for analyzing the input natural language data to extract keywords, means for searching the database for past information and similar cases, means for obtaining and reflecting real-time external environment information in the analysis result, and means for obtaining the position information within the store and providing inventory information and discount information. Thereby, it becomes possible to promptly and appropriately present product information and service information in response to questions from customers.

[0309] "Natural language data" refers to character strings and voice information expressed in the language form that humans use daily.

[0310] "Keyword" is an important word or phrase extracted from natural language data and is used for specifying and classifying information.

[0311] "Database" is a system for systematically storing and managing data such as past information and similar cases.

[0312] "External environment information" refers to data related to elements of the external world that are acquired in real time, such as date and time, weather, and regional information.

[0313] "Analysis result" refers to information and conclusions obtained through the analysis of natural language data.

[0314] "Terminal" is a device for inputting and outputting data, including smartphones, computers, and the like.

[0315] "Position information" is geographical data related to a specific location and is acquired using GPS or the like.

[0316] "Inventory information" refers to the holding status of products in a store and is data related to the quantity and type of products that can be sold.

[0317] "Discount information" refers to information about discounts or special prices on specific products or services.

[0318] The system for implementing this invention mainly consists of a server and a terminal. The server analyzes the input natural language data and extracts keywords to understand the user's intent. If voice input is provided, speech recognition technology is used to convert the voice data into text data. In this case, the server uses a speech recognition API (e.g., Google Speech Recognition API) that has been trained on a large amount of voice data.

[0319] Based on the analysis results, the server searches a database to find past information and similar cases. The server also acquires real-time external environmental information and incorporates it into the analysis results. This includes current data such as date, time, weather, and location, which can be used to provide information on in-store inventory and discounts.

[0320] The terminal has the functionality to visually display the best answers and advice obtained from the server to the user. In particular, the information is presented clearly using relevant images and graphs. This allows the user to respond to customers quickly and accurately.

[0321] As a concrete example, imagine a store employee using a smartphone to operate an application and entering a prompt such as, "What promotions are currently running for this product?" This allows the server to immediately analyze the request and provide information, effectively supporting customer service.

[0322] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0323] Step 1:

[0324] Users use their smartphones to input customer questions via voice or text. In the case of voice input, the smartphone's microphone is used to collect the customer's question. The input data is then sent to a server.

[0325] Step 2:

[0326] The server uses a speech recognition API to convert audio data into text data. The input is an audio file, and the output is a customer question in text format. During this process, the audio data is converted to text with high accuracy.

[0327] Step 3:

[0328] The server analyzes text data using a natural language processing module and extracts important keywords. Text data is the input, and the output is the key keywords. Here, a generative AI model performs contextual analysis and keyword recognition.

[0329] Step 4:

[0330] The server searches the database to find similar queries and related product information from the past. Keywords are used as input, and relevant information is obtained as output. As a result of the search, the relevant information is retrieved quickly.

[0331] Step 5:

[0332] The server uses an external data acquisition module to obtain real-time external environmental information (e.g., weather and inventory status). Inputs are the current date, time, and location, while output is external data. By using this information in the analysis results, more accurate answers and suggestions are generated.

[0333] Step 6:

[0334] The server generates the optimal response or advice based on the acquired data. At this point, the specific information requested by the user is comprehensively summarized. The input consists of past case information and external data, and the output is a specific response.

[0335] Step 7:

[0336] The terminal displays information received from the server via a user interface. Related images and graphs are displayed along with the optimal answer, providing the user with visually easy-to-understand information.

[0337] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0338] This invention is a customer service support system that utilizes speech and natural language processing, and further incorporates an emotion engine that recognizes user emotions. This system not only converts input speech into text and extracts keywords using natural language processing technology, but also has the function of analyzing user emotions in real time.

[0339] In system operation, users input customer inquiries and requests via voice or text. If voice input is selected, the server uses high-precision speech recognition technology to convert it into text data. The converted text is analyzed through a natural language processing module to extract relevant keywords and context.

[0340] At this stage, the server utilizes an emotion engine to evaluate the user's emotional state from voice intonation and text expression. For example, it can determine whether the customer is anxious or satisfied based on their tone of voice and word choice. The information obtained through emotion analysis is used as a crucial element in determining an appropriate response strategy.

[0341] Next, the server searches the database based on the user's input and emotional state, referencing past cases and related information. For example, it extracts the history of responses to similar questions in the past and effective countermeasures based on emotional changes. Furthermore, it acquires external information in real time and generates advice that includes the date, time, weather, and local conditions.

[0342] The generated information is presented visually to the user via the terminal's user interface module. For example, if a customer appears anxious, explanations and information that provide reassurance are highlighted. This improves the quality of customer service and enables the provision of services that are more attentive to the customer's emotions.

[0343] Furthermore, emotional data obtained during interactions is recorded by the server and stored in a database. This allows for the analysis and utilization of past emotional history in future customer interactions. In this way, the overall accuracy and effectiveness of the system improve over the long term.

[0344] The following describes the processing flow.

[0345] Step 1:

[0346] The user inputs customer questions into the system via voice or text. In the case of voice input, the voice data is sent to the terminal via the microphone.

[0347] Step 2:

[0348] The server processes the received audio data using a speech recognition module and converts it into text data. During this process, it analyzes the audio pattern and reduces background noise.

[0349] Step 3:

[0350] The server analyzes the transcribed data using a natural language processing module to extract keywords and contextual structure. This analysis allows for an accurate understanding of the customer's intent.

[0351] Step 4:

[0352] The server uses an emotion engine to identify the user's emotions from their tone of voice and selected words. In particular, it identifies information where the emotional state is a crucial factor in customer service.

[0353] Step 5:

[0354] The server searches the database based on the analyzed text and sentiment information to retrieve relevant past cases, FAQs, and product information. If the sentiment is pronounced, it also refers to past responses applied to similar emotional situations.

[0355] Step 6:

[0356] The server acquires real-time environmental information such as date, time, weather, and location through an external data acquisition module. This information is also taken into consideration to form optimal advice.

[0357] Step 7:

[0358] The terminal visually presents advice and information received from the server to the user through a user interface module. The displayed content consists of text, relevant images, graphs as needed, and other elements that take emotions into consideration.

[0359] Step 8:

[0360] The server records emotional data identified during the interaction and its changes, and stores it in a database for future ideal responses. This data will be used for future analysis and contribute to improving the overall response accuracy of the system.

[0361] (Example 2)

[0362] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0363] Traditional customer service systems could extract keywords from entered information and incorporate external data, but they had limitations in their ability to specifically analyze customer emotions and optimize responses. Furthermore, it was difficult to effectively utilize interaction history to improve the quality of subsequent interactions.

[0364] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0365] In this invention, the server includes means for analyzing input voice or natural language data to extract keywords, means for analyzing the user's emotions, and means for searching a database for past cases and related information. This makes it possible to present countermeasures that take customer emotions into consideration and to improve the quality of service by utilizing accumulated emotional data.

[0366] "Voice or natural language data" refers to information in the form of conversations or texts entered by users, including customer requests and questions.

[0367] "Methods for analyzing and extracting keywords" refers to techniques that analyze input data and identify and pinpoint important words and phrases from it.

[0368] "Methods for analyzing user emotions" refer to technologies that evaluate voice tone and expression in order to determine the emotions contained in conversations and written texts.

[0369] "Means of searching past cases and related information from a database" refers to a technology that searches for and retrieves cases and information that are relevant to the current situation from a pre-existing database of cases and information.

[0370] "Means of acquiring external information in real time and reflecting it in analysis results" refers to technologies that acquire the latest data from the internet and other information sources, and incorporate and utilize it in the results of the analysis.

[0371] "Means for generating optimal advice or suggestions and outputting them to a display device" refers to a technology that creates the most effective advice or solutions based on analyzed information and provides them to the user through a display device.

[0372] "A means of accumulating emotional data acquired during a conversation and reflecting it in future responses" refers to a technology that saves emotionally-based information obtained from interactions with customers and utilizes it for future interactions.

[0373] This invention is a system that utilizes speech and natural language processing technologies to support customer service. The system combines speech-to-text conversion, natural language analysis, and sentiment analysis to provide responses tailored to the customer's emotions. Specific embodiments are described below.

[0374] Users enter inquiries and requests via voice or text. If voice input is used, the information is transmitted to the system via the device's microphone. Voice input is available on readily available devices and is designed with ease of use in mind.

[0375] The server converts the audio into text data using high-precision speech recognition software (e.g., a common speech recognition API). During this process, it accurately captures the subtle nuances of the speech and replaces them with textual information.

[0376] The obtained text data is analyzed using a natural language processing library (e.g., spaCy or other common NLU libraries). Sentence segmentation, keyword extraction, and contextual analysis are performed, thereby extracting important information.

[0377] Next, the server uses an emotion analysis engine (e.g., a common emotion analysis tool) to evaluate the customer's emotions. This allows the server to identify the customer's emotional state, specifically emotions such as joy, sadness, or dissatisfaction.

[0378] Based on the analyzed keywords and sentiment information, the server searches the database to retrieve past cases and related information. Furthermore, it uses external information such as weather and date / time, obtained in real time, to generate more appropriate advice.

[0379] The generated information is presented to the user visually through the terminal's user interface. At this time, the display device uses effective visual materials to provide information in a way that is easily understandable to the customer.

[0380] For example, if a customer expresses dissatisfaction with a product malfunction, the system identifies their feelings and suggests solutions based on past repair cases. This enables a quick and accurate response.

[0381] An example of a prompt message is, "Please enter the customer's voice. The system will analyze their emotions and suggest the best course of action." This encourages users to utilize the system. This technology can significantly improve the quality of customer service and increase customer satisfaction.

[0382] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0383] Step 1:

[0384] Users input their inquiries via voice or text using a terminal. In the case of voice input, the microphone captures the audio and sends it to the system. The voice and text become input data for the system. The voice data is converted into a digital signal upon receipt, ready for processing.

[0385] Step 2:

[0386] The server converts the input audio data into text using speech recognition software. The input is audio data, and the output is text data. In this process, the prosody and pronunciation of the speech are analyzed as features, and highly accurate text conversion is performed based on that information.

[0387] Step 3:

[0388] The server analyzes the transcribed data using a natural language processing library, which is a natural language processing module. The input is text data, and the output is extracted keywords and contextual information. The process involves segmenting sentences, tagging parts of speech, and performing dependency analysis to identify keywords.

[0389] Step 4:

[0390] The server uses an emotion analysis engine to evaluate the user's emotions from text data. The input is analyzed text information, and the output is an emotion score. In this step, emotional features are calculated from the text representation, and emotions such as positive, negative, and neutral are quantified.

[0391] Step 5:

[0392] The server searches the database using keywords and sentiment scores. The input is the identified keywords and sentiment scores, and the output is similar cases and related information. This process involves issuing SQL queries to retrieve past interaction history and related solutions.

[0393] Step 6:

[0394] The server retrieves real-time data from external sources and integrates it into the search results. This includes utilizing APIs to obtain weather, time, and geographical information. The input is the initial search results, and the output is enhanced suggestion information.

[0395] Step 7:

[0396] The terminal visually presents the generated suggestion information to the user through a user interface. The input is integrated suggestion information, and the output is the final display to the user. The terminal provides information visually using infographics and text.

[0397] Step 8:

[0398] The server records the sentiment data and search information obtained during processing and stores it in a database. The input is all the analysis information, and the output is the data accumulated for future use. This data is used to improve the quality of responses in subsequent interactions.

[0399] (Application Example 2)

[0400] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".

[0401] In today's living environment, there is a demand for sophisticated services tailored to individual emotions and physical conditions. In particular, real-time emotional recognition and feedback are essential to improving the quality of communication within the home and ensuring smooth daily living support. However, conventional home support systems currently struggle to accurately interpret users' emotional states, making it difficult to provide individualized support based on these perceptions.

[0402] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0403] In this invention, the server includes means for analyzing input natural language data and extracting keywords, means for evaluating the user's emotional state from speech or text using an emotion analysis engine, and means for playing appropriate sound or visual content based on the user's emotional state. This enables customized responses and service provision tailored to the individual's emotional state.

[0404] "Natural language data" refers to information expressed in the language format that humans use on a daily basis.

[0405] A "data bank" is an information recording device used to store past information and records of similar cases.

[0406] "External environmental information" refers to information about conditions such as weather and time that exist outside the system.

[0407] An "information display device" is a device used to visually present information to a user.

[0408] "Relevant departments" refer to organizations that possess the necessary knowledge and functions to operate the system and address unresolved questions.

[0409] An "emotion analysis engine" is an analytical tool used to evaluate emotional states from speech and text.

[0410] "User's emotional state" refers to the user's current psychological or emotional condition.

[0411] "Audio and visual content" refers to forms of information or entertainment that utilize sound and images.

[0412] This invention applies a customer service support system utilizing voice and natural language processing to a consumer robot used in the home. It primarily uses an emotion recognition engine and real-time database referencing capabilities to provide audio and visual content based on an individual's emotional state. A specific implementation example of the system is shown below.

[0413] The server converts the input speech into text data using high-precision recognition technology. The hardware used here is a Raspberry Pi, and the software utilizes the Google Cloud Speech-to-Text API. The converted text data is then analyzed using natural language processing tools such as NLTK and SpaCy to extract keywords.

[0414] Next, the server uses IBM Watson Tone Analyzer, an emotion analysis engine, to evaluate the user's emotions from their voice tone and text. This makes it possible to determine the user's emotional state.

[0415] Based on the emotional state, the device selects and plays appropriate audio and visual content. Available media include music, videos, and animation.

[0416] When a user makes an inquiry or request by voice, they can say something like, "I'm feeling tired today and would like to relax a bit," and the system will select and play relaxing music.

[0417] Also, an example of a prompt statement is:

[0418] "Please advise on how to soothe users' minds when they want to relax."

[0419] "I would appreciate your advice on the best way to deal with a family member who is experiencing stress."

[0420] This allows for responses tailored to the user's emotional state, improving satisfaction and convenience within the home.

[0421] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0422] Step 1:

[0423] The user speaks to the home robot using their voice. The input here is the user's voice data. The server captures the voice using a microphone on a Raspberry Pi and converts that data to text via the Google Cloud Speech-to-Text API. This conversion process yields the text data.

[0424] Step 2:

[0425] The server analyzes the obtained text data using natural language processing tools such as NLTK and SpaCy. The input is the text data converted in step 1. This process extracts keywords and context, and a list of relevant keywords is output.

[0426] Step 3:

[0427] The server performs sentiment analysis on text data and voice tone using IBM Watson Tone Analyzer. The inputs are the voice tone acquired in step 1 and the text data from step 2. Based on the analysis, the user's emotional state is evaluated, and emotion tags such as anger, joy, and sadness are output.

[0428] Step 4:

[0429] The server, based on emotional state data, references a historical database to select the most appropriate audio or visual content. In this step, emotional tags and keyword lists are inputs. The most suitable content is selected from past history, and that information is output.

[0430] Step 5:

[0431] The device plays the selected audio or visual content. The content information obtained in step 4 is used as input. Using the speaker or display, music, animation, or video that matches the user's emotional state is presented to the user. This completes the provision of a service that is attentive to the user's emotions.

[0432] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0433] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0434] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0435] [Third Embodiment]

[0436] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0437] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0438] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0439] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0440] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0441] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0442] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0443] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0444] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0445] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0446] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0447] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0448] This invention is an AI system that uses speech processing and text analysis to efficiently support customer service in retail stores and service industries. The basic configuration consists of a speech input module, a natural language processing module, a database access module, an external data acquisition module, and a user interface module.

[0449] To begin operating the system, users input customer questions into the system as voice or text. In the case of voice input, the server uses speech recognition technology to convert it into text data. This speech recognition technology is trained using a large amount of voice data, achieving highly accurate text conversion.

[0450] The server passes the converted text data to a natural language processing module for semantic analysis. Specifically, it extracts relevant keywords, understands the context, and infers the customer's intent. In this process, it compares the data with similar customer inquiries and FAQ information previously recorded in the system.

[0451] The analyzed information is then retrieved by the server from a database to find relevant case studies and product information. This database includes past customer service history, product manuals, and campaign details. The server also obtains real-time data such as date, time, weather, and location information through an external data acquisition module, which is used to generate final advice.

[0452] The terminal aggregates this information and presents the user with the most appropriate response via a user interface module. This display is visually clear and well-organized, and includes relevant images and graphs as needed. For example, when explaining product information during a campaign, product photos and sales performance graphs are displayed to help the user respond effectively to customers.

[0453] If the system cannot immediately provide an appropriate answer to a customer's question, the server automatically forwards the unresolved inquiry to the relevant department and incorporates the received feedback into the database. This continuously improves the accuracy and usefulness of the system.

[0454] In this way, it becomes possible to dramatically improve the quality and speed of customer service, reduce the burden on employees, and strengthen the overall customer service capabilities of the company.

[0455] The following describes the processing flow.

[0456] Step 1:

[0457] The user enters customer questions in either voice or text format. In the case of voice input, the voice data is sent to the device via the microphone.

[0458] Step 2:

[0459] The server receives audio data and converts it to text through a speech recognition module. During this process, it analyzes the characteristics of the audio and removes unnecessary noise to ensure accurate text conversion.

[0460] Step 3:

[0461] The server sends the converted text data to a natural language processing module for grammatical analysis. It extracts keywords and performs semantic analysis to understand the content of the text.

[0462] Step 4:

[0463] Based on the analysis, the server searches for past cases and related information through the database access module. This includes FAQs, past customer support history, and product information.

[0464] Step 5:

[0465] The server uses an external data acquisition module to obtain real-time information such as date, time, weather, and location. This allows it to gather complementary information to make the advice it provides more realistic and accurate.

[0466] Step 6:

[0467] The server integrates historical data with external information to generate optimal advice or answers. In doing so, it uses AI algorithms to evaluate multiple options and derive the best solution.

[0468] Step 7:

[0469] The terminal displays information provided by the server to the user via a user interface module. The displayed information includes text, images, graphs, etc., and is organized in a visually clear and easy-to-understand manner.

[0470] Step 8:

[0471] If a suitable answer cannot be found, the server automatically forwards the unresolved question to the relevant department and awaits feedback. This feedback is later registered in the system's database and updated to help handle future inquiries.

[0472] (Example 1)

[0473] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0474] In modern society, improving the speed and accuracy of customer service in retail stores and service industries is a crucial challenge. However, conventional systems have insufficient processing capabilities for voice and text data, making it difficult to acquire and incorporate external data in real time. Furthermore, they struggle to understand complex customer intentions and generate optimal responses. There is a need to solve these problems and improve customer satisfaction.

[0475] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0476] In this invention, the server includes means for analyzing input natural language information to extract key words, means for searching past information and similar cases from the information database, and means for acquiring current external environmental information and reflecting it in the analysis results. This makes it possible to provide quick and accurate answers to customer questions.

[0477] "Natural language information" refers to information written in the language that humans use in everyday life, and is the data format before it is converted into a format that can be processed by machines.

[0478] "Key words" are important keywords extracted from natural language information and are elements used for understanding and searching the content of text.

[0479] "Information aggregation" refers to databases or knowledge bases where past cases and related information are stored, and is a collection of information that can be searched.

[0480] "External environmental information" refers to data such as date and time, weather, and regional information acquired in real time from outside the system, and is a factor that may affect the analysis results.

[0481] "Visual information" refers to data presented to users in a visually recognizable format, such as images or graphs.

[0482] "Statistical representation" refers to information provided in the form of graphs, charts, and other visual aids used to present data in an easily understandable way.

[0483] "Recognizing intent" means analyzing the context and content of the input information to understand the speaker's purpose and requests.

[0484] This system is an AI system that uses speech processing and text analysis to efficiently support customer service in retail stores and service industries. The basic configuration of the system consists of a speech input module, a natural language processing module, a database access module, an external data acquisition module, and a user interface module.

[0485] Voice input module:

[0486] Users input customer questions into the system as voice or text. In the case of voice input, the server uses speech recognition technology to convert it into text data. This speech recognition uses advanced technologies such as "speech recognition services." For example, if a customer asks a question by voice, "What is the warranty period for this product?", the server converts the voice into text with high accuracy.

[0487] Natural language processing module:

[0488] The server passes the converted text data to a natural language processing module for semantic analysis. Using a "natural language processing library," it extracts relevant keywords and understands the context to infer the customer's intent. This allows it to recognize that the customer is asking about the "warranty period."

[0489] Database access module:

[0490] Based on the analyzed information, the server searches the database for relevant product information and past case studies. A "database management system" is used here, enabling the rapid and accurate retrieval of relevant information.

[0491] External data acquisition module:

[0492] An external data acquisition module operates to obtain real-time external environmental information and reflect it in the analysis results. For example, it retrieves weather information and regional information from the "External Data API" to help generate final advice.

[0493] User interface module:

[0494] The terminal aggregates this information and presents it to the user in an easy-to-understand visual format through a user interface. For example, it displays product-related images and sales data graphs to help users effectively explain products to customers.

[0495] Example of a prompt:

[0496] "How can we improve the speed of our customer response?"

[0497] Through these modules, the system can respond to customers quickly and accurately, improving the company's customer service capabilities.

[0498] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0499] Step 1:

[0500] Users input customer questions into the system in either voice or text format. In the case of voice input, the server uses a speech recognition module to capture the voice data and convert it into text data. Specifically, the server analyzes the voice signal and uses a language model to convert it into a string. The input is sound waves, and the output is the corresponding text data.

[0501] Step 2:

[0502] The server passes the converted text data to a natural language processing module. Here, the server performs syntactic analysis of the text and extracts key words. Specifically, it identifies words and phrases in the text and analyzes their meanings. This process is performed using a morphological analyzer, with text data as input and analysis information including key words and intent as output.

[0503] Step 3:

[0504] The server performs database searches based on the analysis information. It uses a database access module to search for similar past cases and product information. Specifically, it uses SQL queries to perform cross-database searches and retrieve the necessary information. The input is the analysis information, and the output is a dataset containing related information.

[0505] Step 4:

[0506] The server acquires real-time external environmental information via an external data acquisition module and incorporates it into the analysis results. Specifically, it uses an API to obtain environment variables such as weather and supplements the analysis information. In this process, the input is the analysis information, and the output is the enhanced analysis information.

[0507] Step 5:

[0508] The terminal presents information to the user using a user interface module, based on enhanced analytical information from the server. Specifically, it uses a GUI library to visually organize data and display it on the terminal screen. The input is enhanced analytical information, and the output is the visual information presented to the user.

[0509] Step 6:

[0510] If immediate resolution is difficult, the server reports the unresolved inquiry to the relevant department and implements a specific process to gather subsequent feedback. Specifically, it automates the distribution of inquiries via email and ticketing systems, and facilitates the addition of feedback data to the database. In this process, the input is the unresolved inquiry, and the output is an updated database reflecting the feedback.

[0511] (Application Example 1)

[0512] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0513] Retail stores and service industries are facing a need to improve the efficiency and quality of customer service. Currently, it is difficult to respond quickly and appropriately to a wide range of customer inquiries, increasing the burden on store employees and potentially leading to decreased customer satisfaction. In particular, there is a demand for the ability to instantly analyze large amounts of information and derive the optimal answer.

[0514] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0515] In this invention, the server includes means for analyzing input natural language data to extract keywords, means for searching a database for past information and similar cases, means for acquiring external environmental information in real time and reflecting it in the analysis results, and means for acquiring location information within the store and providing inventory information and discount information. This makes it possible to quickly and appropriately present product and service information in response to customer inquiries.

[0516] "Natural language data" refers to strings of characters and audio information expressed in the language forms that humans use on a daily basis.

[0517] "Keywords" are important words or phrases extracted from natural language data and are used to identify and classify information.

[0518] A "database" is a system for systematically accumulating and managing data such as past information and similar cases.

[0519] "External environmental information" refers to data about external factors acquired in real time, such as date and time, weather, and regional information.

[0520] "Analysis results" refers to the information and conclusions obtained through the analysis of natural language data.

[0521] A "terminal" is a device used for inputting and outputting data, and includes smartphones and computers.

[0522] "Location information" refers to geographical data about a specific place, and is obtained using methods such as GPS.

[0523] "Inventory information" refers to the status of goods held in a store, and includes data on the quantity and types of products available for sale.

[0524] "Discount information" refers to information about discounts or special prices on specific products or services.

[0525] The system for implementing this invention mainly consists of a server and a terminal. The server analyzes the input natural language data and extracts keywords to understand the user's intent. If voice input is provided, speech recognition technology is used to convert the voice data into text data. In this case, the server uses a speech recognition API (e.g., Google Speech Recognition API) that has been trained on a large amount of voice data.

[0526] Based on the analysis results, the server searches a database to find past information and similar cases. The server also acquires real-time external environmental information and incorporates it into the analysis results. This includes current data such as date, time, weather, and location, which can be used to provide information on in-store inventory and discounts.

[0527] The terminal has the functionality to visually display the best answers and advice obtained from the server to the user. In particular, the information is presented clearly using relevant images and graphs. This allows the user to respond to customers quickly and accurately.

[0528] As a concrete example, imagine a store employee using a smartphone to operate an application and entering a prompt such as, "What promotions are currently running for this product?" This allows the server to immediately analyze the request and provide information, effectively supporting customer service.

[0529] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0530] Step 1:

[0531] Users use their smartphones to input customer questions via voice or text. In the case of voice input, the smartphone's microphone is used to collect the customer's question. The input data is then sent to a server.

[0532] Step 2:

[0533] The server uses a speech recognition API to convert audio data into text data. The input is an audio file, and the output is a customer question in text format. During this process, the audio data is converted to text with high accuracy.

[0534] Step 3:

[0535] The server analyzes text data using a natural language processing module and extracts important keywords. Text data is the input, and the output is the key keywords. Here, a generative AI model performs contextual analysis and keyword recognition.

[0536] Step 4:

[0537] The server searches the database to find similar queries and related product information from the past. Keywords are used as input, and relevant information is obtained as output. As a result of the search, the relevant information is retrieved quickly.

[0538] Step 5:

[0539] The server uses an external data acquisition module to obtain real-time external environmental information (e.g., weather and inventory status). Inputs are the current date, time, and location, while output is external data. By using this information in the analysis results, more accurate answers and suggestions are generated.

[0540] Step 6:

[0541] The server generates the optimal response or advice based on the acquired data. At this point, the specific information requested by the user is comprehensively summarized. The input consists of past case information and external data, and the output is a specific response.

[0542] Step 7:

[0543] The terminal displays information received from the server via a user interface. Related images and graphs are displayed along with the optimal answer, providing the user with visually easy-to-understand information.

[0544] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0545] This invention is a customer service support system that utilizes speech and natural language processing, and further incorporates an emotion engine that recognizes user emotions. This system not only converts input speech into text and extracts keywords using natural language processing technology, but also has the function of analyzing user emotions in real time.

[0546] In system operation, users input customer inquiries and requests via voice or text. If voice input is selected, the server uses high-precision speech recognition technology to convert it into text data. The converted text is analyzed through a natural language processing module to extract relevant keywords and context.

[0547] At this stage, the server utilizes an emotion engine to evaluate the user's emotional state from voice intonation and text expression. For example, it can determine whether the customer is anxious or satisfied based on their tone of voice and word choice. The information obtained through emotion analysis is used as a crucial element in determining an appropriate response strategy.

[0548] Next, the server searches the database based on the user's input and emotional state, referencing past cases and related information. For example, it extracts the history of responses to similar questions in the past and effective countermeasures based on emotional changes. Furthermore, it acquires external information in real time and generates advice that includes the date, time, weather, and local conditions.

[0549] The generated information is presented visually to the user via the terminal's user interface module. For example, if a customer appears anxious, explanations and information that provide reassurance are highlighted. This improves the quality of customer service and enables the provision of services that are more attentive to the customer's emotions.

[0550] Furthermore, emotional data obtained during interactions is recorded by the server and stored in a database. This allows for the analysis and utilization of past emotional history in future customer interactions. In this way, the overall accuracy and effectiveness of the system improve over the long term.

[0551] The following describes the processing flow.

[0552] Step 1:

[0553] The user inputs customer questions into the system via voice or text. In the case of voice input, the voice data is sent to the terminal via the microphone.

[0554] Step 2:

[0555] The server processes the received audio data using a speech recognition module and converts it into text data. During this process, it analyzes the audio pattern and reduces background noise.

[0556] Step 3:

[0557] The server analyzes the transcribed data using a natural language processing module to extract keywords and contextual structure. This analysis allows for an accurate understanding of the customer's intent.

[0558] Step 4:

[0559] The server uses an emotion engine to identify the user's emotions from their tone of voice and selected words. In particular, it identifies information where the emotional state is a crucial factor in customer service.

[0560] Step 5:

[0561] The server searches the database based on the analyzed text and sentiment information to retrieve relevant past cases, FAQs, and product information. If the sentiment is pronounced, it also refers to past responses applied to similar emotional situations.

[0562] Step 6:

[0563] The server acquires real-time environmental information such as date, time, weather, and location through an external data acquisition module. This information is also taken into consideration to form optimal advice.

[0564] Step 7:

[0565] The terminal visually presents advice and information received from the server to the user through a user interface module. The displayed content consists of text, relevant images, graphs as needed, and other elements that take emotions into consideration.

[0566] Step 8:

[0567] The server records emotional data identified during the interaction and its changes, and stores it in a database for future ideal responses. This data will be used for future analysis and contribute to improving the overall response accuracy of the system.

[0568] (Example 2)

[0569] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0570] Traditional customer service systems could extract keywords from entered information and incorporate external data, but they had limitations in their ability to specifically analyze customer emotions and optimize responses. Furthermore, it was difficult to effectively utilize interaction history to improve the quality of subsequent interactions.

[0571] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0572] In this invention, the server includes means for analyzing input voice or natural language data to extract keywords, means for analyzing the user's emotions, and means for searching a database for past cases and related information. This makes it possible to present countermeasures that take customer emotions into consideration and to improve the quality of service by utilizing accumulated emotional data.

[0573] "Voice or natural language data" refers to information in the form of conversations or texts entered by users, including customer requests and questions.

[0574] "Methods for analyzing and extracting keywords" refers to techniques that analyze input data and identify and pinpoint important words and phrases from it.

[0575] "Methods for analyzing user emotions" refer to technologies that evaluate voice tone and expression in order to determine the emotions contained in conversations and written texts.

[0576] "Means of searching past cases and related information from a database" refers to a technology that searches for and retrieves cases and information that are relevant to the current situation from a pre-existing database of cases and information.

[0577] "Means of acquiring external information in real time and reflecting it in analysis results" refers to technologies that acquire the latest data from the internet and other information sources, and incorporate and utilize it in the results of the analysis.

[0578] "Means for generating optimal advice or suggestions and outputting them to a display device" refers to a technology that creates the most effective advice or solutions based on analyzed information and provides them to the user through a display device.

[0579] "A means of accumulating emotional data acquired during a conversation and reflecting it in future responses" refers to a technology that saves emotionally-based information obtained from interactions with customers and utilizes it for future interactions.

[0580] This invention is a system that utilizes speech and natural language processing technologies to support customer service. The system combines speech-to-text conversion, natural language analysis, and sentiment analysis to provide responses tailored to the customer's emotions. Specific embodiments are described below.

[0581] Users enter inquiries and requests via voice or text. If voice input is used, the information is transmitted to the system via the device's microphone. Voice input is available on readily available devices and is designed with ease of use in mind.

[0582] The server converts the audio into text data using high-precision speech recognition software (e.g., a common speech recognition API). During this process, it accurately captures the subtle nuances of the speech and replaces them with textual information.

[0583] The obtained text data is analyzed using a natural language processing library (e.g., spaCy or other common NLU libraries). Sentence segmentation, keyword extraction, and contextual analysis are performed, thereby extracting important information.

[0584] Next, the server uses an emotion analysis engine (e.g., a common emotion analysis tool) to evaluate the customer's emotions. This allows the server to identify the customer's emotional state, specifically emotions such as joy, sadness, or dissatisfaction.

[0585] Based on the analyzed keywords and sentiment information, the server searches the database to retrieve past cases and related information. Furthermore, it uses external information such as weather and date / time, obtained in real time, to generate more appropriate advice.

[0586] The generated information is presented to the user visually through the terminal's user interface. At this time, the display device uses effective visual materials to provide information in a way that is easily understandable to the customer.

[0587] For example, if a customer expresses dissatisfaction with a product malfunction, the system identifies their feelings and suggests solutions based on past repair cases. This enables a quick and accurate response.

[0588] An example of a prompt message is, "Please enter the customer's voice. The system will analyze their emotions and suggest the best course of action." This encourages users to utilize the system. This technology can significantly improve the quality of customer service and increase customer satisfaction.

[0589] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0590] Step 1:

[0591] Users input their inquiries via voice or text using a terminal. In the case of voice input, the microphone captures the audio and sends it to the system. The voice and text become input data for the system. The voice data is converted into a digital signal upon receipt, ready for processing.

[0592] Step 2:

[0593] The server converts the input audio data into text using speech recognition software. The input is audio data, and the output is text data. In this process, the prosody and pronunciation of the speech are analyzed as features, and highly accurate text conversion is performed based on that information.

[0594] Step 3:

[0595] The server analyzes the transcribed data using a natural language processing library, which is a natural language processing module. The input is text data, and the output is extracted keywords and contextual information. The process involves segmenting sentences, tagging parts of speech, and performing dependency analysis to identify keywords.

[0596] Step 4:

[0597] The server uses an emotion analysis engine to evaluate the user's emotions from text data. The input is analyzed text information, and the output is an emotion score. In this step, emotional features are calculated from the text representation, and emotions such as positive, negative, and neutral are quantified.

[0598] Step 5:

[0599] The server searches the database using keywords and sentiment scores. The input is the identified keywords and sentiment scores, and the output is similar cases and related information. This process involves issuing SQL queries to retrieve past interaction history and related solutions.

[0600] Step 6:

[0601] The server retrieves real-time data from external sources and integrates it into the search results. This includes utilizing APIs to obtain weather, time, and geographical information. The input is the initial search results, and the output is enhanced suggestion information.

[0602] Step 7:

[0603] The terminal visually presents the generated suggestion information to the user through a user interface. The input is integrated suggestion information, and the output is the final display to the user. The terminal provides information visually using infographics and text.

[0604] Step 8:

[0605] The server records the sentiment data and search information obtained during processing and stores it in a database. The input is all the analysis information, and the output is the data accumulated for future use. This data is used to improve the quality of responses in subsequent interactions.

[0606] (Application Example 2)

[0607] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0608] In today's living environment, there is a demand for sophisticated services tailored to individual emotions and physical conditions. In particular, real-time emotional recognition and feedback are essential to improving the quality of communication within the home and ensuring smooth daily living support. However, conventional home support systems currently struggle to accurately interpret users' emotional states, making it difficult to provide individualized support based on these perceptions.

[0609] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0610] In this invention, the server includes means for analyzing input natural language data and extracting keywords, means for evaluating the user's emotional state from speech or text using an emotion analysis engine, and means for playing appropriate sound or visual content based on the user's emotional state. This enables customized responses and service provision tailored to the individual's emotional state.

[0611] "Natural language data" refers to information expressed in the language format that humans use on a daily basis.

[0612] A "data bank" is an information recording device used to store past information and records of similar cases.

[0613] "External environmental information" refers to information about conditions such as weather and time that exist outside the system.

[0614] An "information display device" is a device used to visually present information to a user.

[0615] "Relevant departments" refer to organizations that possess the necessary knowledge and functions to operate the system and address unresolved questions.

[0616] An "emotion analysis engine" is an analytical tool used to evaluate emotional states from speech and text.

[0617] "User's emotional state" refers to the user's current psychological or emotional condition.

[0618] "Audio and visual content" refers to forms of information or entertainment that utilize sound and images.

[0619] This invention applies a customer service support system utilizing voice and natural language processing to a consumer robot used in the home. It primarily uses an emotion recognition engine and real-time database referencing capabilities to provide audio and visual content based on an individual's emotional state. A specific implementation example of the system is shown below.

[0620] The server converts the input speech into text data using high-precision recognition technology. The hardware used here is a Raspberry Pi, and the software utilizes the Google Cloud Speech-to-Text API. The converted text data is then analyzed using natural language processing tools such as NLTK and SpaCy to extract keywords.

[0621] Next, the server uses IBM Watson Tone Analyzer, an emotion analysis engine, to evaluate the user's emotions from their voice tone and text. This makes it possible to determine the user's emotional state.

[0622] Based on the emotional state, the device selects and plays appropriate audio and visual content. Available media include music, videos, and animation.

[0623] When a user makes an inquiry or request by voice, they can say something like, "I'm feeling tired today and would like to relax a bit," and the system will select and play relaxing music.

[0624] Also, an example of a prompt statement is:

[0625] "Please advise on how to soothe users' minds when they want to relax."

[0626] "I would appreciate your advice on the best way to deal with a family member who is experiencing stress."

[0627] This allows for responses tailored to the user's emotional state, improving satisfaction and convenience within the home.

[0628] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0629] Step 1:

[0630] The user speaks to the home robot using their voice. The input here is the user's voice data. The server captures the voice using a microphone on a Raspberry Pi and converts that data to text via the Google Cloud Speech-to-Text API. This conversion process yields the text data.

[0631] Step 2:

[0632] The server analyzes the obtained text data using natural language processing tools such as NLTK and SpaCy. The input is the text data converted in step 1. This process extracts keywords and context, and a list of relevant keywords is output.

[0633] Step 3:

[0634] The server performs sentiment analysis on text data and voice tone using IBM Watson Tone Analyzer. The inputs are the voice tone acquired in step 1 and the text data from step 2. Based on the analysis, the user's emotional state is evaluated, and emotion tags such as anger, joy, and sadness are output.

[0635] Step 4:

[0636] The server, based on emotional state data, references a historical database to select the most appropriate audio or visual content. In this step, emotional tags and keyword lists are inputs. The most suitable content is selected from past history, and that information is output.

[0637] Step 5:

[0638] The device plays the selected audio or visual content. The content information obtained in step 4 is used as input. Using the speaker or display, music, animation, or video that matches the user's emotional state is presented to the user. This completes the provision of a service that is attentive to the user's emotions.

[0639] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0640] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0641] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0642] [Fourth Embodiment]

[0643] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0644] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0645] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0646] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0647] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0648] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0649] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0650] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0651] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0652] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0653] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0654] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0655] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0656] This invention is an AI system that uses speech processing and text analysis to efficiently support customer service in retail stores and service industries. The basic configuration consists of a speech input module, a natural language processing module, a database access module, an external data acquisition module, and a user interface module.

[0657] To begin operating the system, users input customer questions into the system as voice or text. In the case of voice input, the server uses speech recognition technology to convert it into text data. This speech recognition technology is trained using a large amount of voice data, achieving highly accurate text conversion.

[0658] The server passes the converted text data to a natural language processing module for semantic analysis. Specifically, it extracts relevant keywords, understands the context, and infers the customer's intent. In this process, it compares the data with similar customer inquiries and FAQ information previously recorded in the system.

[0659] The analyzed information is then retrieved by the server from a database to find relevant case studies and product information. This database includes past customer service history, product manuals, and campaign details. The server also obtains real-time data such as date, time, weather, and location information through an external data acquisition module, which is used to generate final advice.

[0660] The terminal aggregates this information and presents the user with the most appropriate response via a user interface module. This display is visually clear and well-organized, and includes relevant images and graphs as needed. For example, when explaining product information during a campaign, product photos and sales performance graphs are displayed to help the user respond effectively to customers.

[0661] If the system cannot immediately provide an appropriate answer to a customer's question, the server automatically forwards the unresolved inquiry to the relevant department and incorporates the received feedback into the database. This continuously improves the accuracy and usefulness of the system.

[0662] In this way, it becomes possible to dramatically improve the quality and speed of customer service, reduce the burden on employees, and strengthen the overall customer service capabilities of the company.

[0663] The following describes the processing flow.

[0664] Step 1:

[0665] The user enters customer questions in either voice or text format. In the case of voice input, the voice data is sent to the device via the microphone.

[0666] Step 2:

[0667] The server receives audio data and converts it to text through a speech recognition module. During this process, it analyzes the characteristics of the audio and removes unnecessary noise to ensure accurate text conversion.

[0668] Step 3:

[0669] The server sends the converted text data to a natural language processing module for grammatical analysis. It extracts keywords and performs semantic analysis to understand the content of the text.

[0670] Step 4:

[0671] Based on the analysis, the server searches for past cases and related information through the database access module. This includes FAQs, past customer support history, and product information.

[0672] Step 5:

[0673] The server uses an external data acquisition module to obtain real-time information such as date, time, weather, and location. This allows it to gather complementary information to make the advice it provides more realistic and accurate.

[0674] Step 6:

[0675] The server integrates historical data with external information to generate optimal advice or answers. In doing so, it uses AI algorithms to evaluate multiple options and derive the best solution.

[0676] Step 7:

[0677] The terminal displays information provided by the server to the user via a user interface module. The displayed information includes text, images, graphs, etc., and is organized in a visually clear and easy-to-understand manner.

[0678] Step 8:

[0679] If a suitable answer cannot be found, the server automatically forwards the unresolved question to the relevant department and awaits feedback. This feedback is later registered in the system's database and updated to help handle future inquiries.

[0680] (Example 1)

[0681] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0682] In modern society, improving the speed and accuracy of customer service in retail stores and service industries is a crucial challenge. However, conventional systems have insufficient processing capabilities for voice and text data, making it difficult to acquire and incorporate external data in real time. Furthermore, they struggle to understand complex customer intentions and generate optimal responses. There is a need to solve these problems and improve customer satisfaction.

[0683] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0684] In this invention, the server includes means for analyzing input natural language information to extract key words, means for searching past information and similar cases from the information database, and means for acquiring current external environmental information and reflecting it in the analysis results. This makes it possible to provide quick and accurate answers to customer questions.

[0685] "Natural language information" refers to information written in the language that humans use in everyday life, and is the data format before it is converted into a format that can be processed by machines.

[0686] "Key words" are important keywords extracted from natural language information and are elements used for understanding and searching the content of text.

[0687] "Information aggregation" refers to databases or knowledge bases where past cases and related information are stored, and is a collection of information that can be searched.

[0688] "External environmental information" refers to data such as date and time, weather, and regional information acquired in real time from outside the system, and is a factor that may affect the analysis results.

[0689] "Visual information" refers to data presented to users in a visually recognizable format, such as images or graphs.

[0690] "Statistical representation" refers to information provided in the form of graphs, charts, and other visual aids used to present data in an easily understandable way.

[0691] "Recognizing intent" means analyzing the context and content of the input information to understand the speaker's purpose and requests.

[0692] This system is an AI system that uses speech processing and text analysis to efficiently support customer service in retail stores and service industries. The basic configuration of the system consists of a speech input module, a natural language processing module, a database access module, an external data acquisition module, and a user interface module.

[0693] Voice input module:

[0694] Users input customer questions into the system as voice or text. In the case of voice input, the server uses speech recognition technology to convert it into text data. This speech recognition uses advanced technologies such as "speech recognition services." For example, if a customer asks a question by voice, "What is the warranty period for this product?", the server converts the voice into text with high accuracy.

[0695] Natural language processing module:

[0696] The server passes the converted text data to a natural language processing module for semantic analysis. Using a "natural language processing library," it extracts relevant keywords and understands the context to infer the customer's intent. This allows it to recognize that the customer is asking about the "warranty period."

[0697] Database access module:

[0698] Based on the analyzed information, the server searches the database for relevant product information and past case studies. A "database management system" is used here, enabling the rapid and accurate retrieval of relevant information.

[0699] External data acquisition module:

[0700] An external data acquisition module operates to obtain real-time external environmental information and reflect it in the analysis results. For example, it retrieves weather information and regional information from the "External Data API" to help generate final advice.

[0701] User interface module:

[0702] The terminal aggregates this information and presents it to the user in an easy-to-understand visual format through a user interface. For example, it displays product-related images and sales data graphs to help users effectively explain products to customers.

[0703] Example of a prompt:

[0704] "How can we improve the speed of our customer response?"

[0705] Through these modules, the system can respond to customers quickly and accurately, improving the company's customer service capabilities.

[0706] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0707] Step 1:

[0708] Users input customer questions into the system in either voice or text format. In the case of voice input, the server uses a speech recognition module to capture the voice data and convert it into text data. Specifically, the server analyzes the voice signal and uses a language model to convert it into a string. The input is sound waves, and the output is the corresponding text data.

[0709] Step 2:

[0710] The server passes the converted text data to a natural language processing module. Here, the server performs syntactic analysis of the text and extracts key words. Specifically, it identifies words and phrases in the text and analyzes their meanings. This process is performed using a morphological analyzer, with text data as input and analysis information including key words and intent as output.

[0711] Step 3:

[0712] The server performs database searches based on the analysis information. It uses a database access module to search for similar past cases and product information. Specifically, it uses SQL queries to perform cross-database searches and retrieve the necessary information. The input is the analysis information, and the output is a dataset containing related information.

[0713] Step 4:

[0714] The server acquires real-time external environmental information via an external data acquisition module and incorporates it into the analysis results. Specifically, it uses an API to obtain environment variables such as weather and supplements the analysis information. In this process, the input is the analysis information, and the output is the enhanced analysis information.

[0715] Step 5:

[0716] The terminal presents information to the user using a user interface module, based on enhanced analytical information from the server. Specifically, it uses a GUI library to visually organize data and display it on the terminal screen. The input is enhanced analytical information, and the output is the visual information presented to the user.

[0717] Step 6:

[0718] If immediate resolution is difficult, the server reports the unresolved inquiry to the relevant department and implements a specific process to gather subsequent feedback. Specifically, it automates the distribution of inquiries via email and ticketing systems, and facilitates the addition of feedback data to the database. In this process, the input is the unresolved inquiry, and the output is an updated database reflecting the feedback.

[0719] (Application Example 1)

[0720] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0721] Retail stores and service industries are facing a need to improve the efficiency and quality of customer service. Currently, it is difficult to respond quickly and appropriately to a wide range of customer inquiries, increasing the burden on store employees and potentially leading to decreased customer satisfaction. In particular, there is a demand for the ability to instantly analyze large amounts of information and derive the optimal answer.

[0722] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0723] In this invention, the server includes means for analyzing input natural language data to extract keywords, means for searching a database for past information and similar cases, means for acquiring external environmental information in real time and reflecting it in the analysis results, and means for acquiring location information within the store and providing inventory information and discount information. This makes it possible to quickly and appropriately present product and service information in response to customer inquiries.

[0724] "Natural language data" refers to strings of characters and audio information expressed in the language forms that humans use on a daily basis.

[0725] "Keywords" are important words or phrases extracted from natural language data and are used to identify and classify information.

[0726] A "database" is a system for systematically accumulating and managing data such as past information and similar cases.

[0727] "External environmental information" refers to data about external factors acquired in real time, such as date and time, weather, and regional information.

[0728] "Analysis results" refers to the information and conclusions obtained through the analysis of natural language data.

[0729] A "terminal" is a device used for inputting and outputting data, and includes smartphones and computers.

[0730] "Location information" refers to geographical data about a specific place, and is obtained using methods such as GPS.

[0731] "Inventory information" refers to the status of goods held in a store, and includes data on the quantity and types of products available for sale.

[0732] "Discount information" refers to information about discounts or special prices on specific products or services.

[0733] The system for implementing this invention mainly consists of a server and a terminal. The server analyzes the input natural language data and extracts keywords to understand the user's intent. If voice input is provided, speech recognition technology is used to convert the voice data into text data. In this case, the server uses a speech recognition API (e.g., Google Speech Recognition API) that has been trained on a large amount of voice data.

[0734] Based on the analysis results, the server searches a database to find past information and similar cases. The server also acquires real-time external environmental information and incorporates it into the analysis results. This includes current data such as date, time, weather, and location, which can be used to provide information on in-store inventory and discounts.

[0735] The terminal has the functionality to visually display the best answers and advice obtained from the server to the user. In particular, the information is presented clearly using relevant images and graphs. This allows the user to respond to customers quickly and accurately.

[0736] As a concrete example, imagine a store employee using a smartphone to operate an application and entering a prompt such as, "What promotions are currently running for this product?" This allows the server to immediately analyze the request and provide information, effectively supporting customer service.

[0737] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0738] Step 1:

[0739] Users use their smartphones to input customer questions via voice or text. In the case of voice input, the smartphone's microphone is used to collect the customer's question. The input data is then sent to a server.

[0740] Step 2:

[0741] The server uses a speech recognition API to convert audio data into text data. The input is an audio file, and the output is a customer question in text format. During this process, the audio data is converted to text with high accuracy.

[0742] Step 3:

[0743] The server analyzes text data using a natural language processing module and extracts important keywords. Text data is the input, and the output is the key keywords. Here, a generative AI model performs contextual analysis and keyword recognition.

[0744] Step 4:

[0745] The server searches the database to find similar queries and related product information from the past. Keywords are used as input, and relevant information is obtained as output. As a result of the search, the relevant information is retrieved quickly.

[0746] Step 5:

[0747] The server uses an external data acquisition module to obtain real-time external environmental information (e.g., weather and inventory status). Inputs are the current date, time, and location, while output is external data. By using this information in the analysis results, more accurate answers and suggestions are generated.

[0748] Step 6:

[0749] The server generates the optimal response or advice based on the acquired data. At this point, the specific information requested by the user is comprehensively summarized. The input consists of past case information and external data, and the output is a specific response.

[0750] Step 7:

[0751] The terminal displays information received from the server via a user interface. Related images and graphs are displayed along with the optimal answer, providing the user with visually easy-to-understand information.

[0752] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0753] This invention is a customer service support system that utilizes speech and natural language processing, and further incorporates an emotion engine that recognizes user emotions. This system not only converts input speech into text and extracts keywords using natural language processing technology, but also has the function of analyzing user emotions in real time.

[0754] In system operation, users input customer inquiries and requests via voice or text. If voice input is selected, the server uses high-precision speech recognition technology to convert it into text data. The converted text is analyzed through a natural language processing module to extract relevant keywords and context.

[0755] At this stage, the server utilizes an emotion engine to evaluate the user's emotional state from voice intonation and text expression. For example, it can determine whether the customer is anxious or satisfied based on their tone of voice and word choice. The information obtained through emotion analysis is used as a crucial element in determining an appropriate response strategy.

[0756] Next, the server searches the database based on the user's input and emotional state, referencing past cases and related information. For example, it extracts the history of responses to similar questions in the past and effective countermeasures based on emotional changes. Furthermore, it acquires external information in real time and generates advice that includes the date, time, weather, and local conditions.

[0757] The generated information is presented visually to the user via the terminal's user interface module. For example, if a customer appears anxious, explanations and information that provide reassurance are highlighted. This improves the quality of customer service and enables the provision of services that are more attentive to the customer's emotions.

[0758] Furthermore, emotional data obtained during interactions is recorded by the server and stored in a database. This allows for the analysis and utilization of past emotional history in future customer interactions. In this way, the overall accuracy and effectiveness of the system improve over the long term.

[0759] The following describes the processing flow.

[0760] Step 1:

[0761] The user inputs customer questions into the system via voice or text. In the case of voice input, the voice data is sent to the terminal via the microphone.

[0762] Step 2:

[0763] The server processes the received audio data using a speech recognition module and converts it into text data. During this process, it analyzes the audio pattern and reduces background noise.

[0764] Step 3:

[0765] The server analyzes the transcribed data using a natural language processing module to extract keywords and contextual structure. This analysis allows for an accurate understanding of the customer's intent.

[0766] Step 4:

[0767] The server uses an emotion engine to identify the user's emotions from their tone of voice and selected words. In particular, it identifies information where the emotional state is a crucial factor in customer service.

[0768] Step 5:

[0769] The server searches the database based on the analyzed text and sentiment information to retrieve relevant past cases, FAQs, and product information. If the sentiment is pronounced, it also refers to past responses applied to similar emotional situations.

[0770] Step 6:

[0771] The server acquires real-time environmental information such as date, time, weather, and location through an external data acquisition module. This information is also taken into consideration to form optimal advice.

[0772] Step 7:

[0773] The terminal visually presents advice and information received from the server to the user through a user interface module. The displayed content consists of text, relevant images, graphs as needed, and other elements that take emotions into consideration.

[0774] Step 8:

[0775] The server records emotional data identified during the interaction and its changes, and stores it in a database for future ideal responses. This data will be used for future analysis and contribute to improving the overall response accuracy of the system.

[0776] (Example 2)

[0777] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0778] Traditional customer service systems could extract keywords from entered information and incorporate external data, but they had limitations in their ability to specifically analyze customer emotions and optimize responses. Furthermore, it was difficult to effectively utilize interaction history to improve the quality of subsequent interactions.

[0779] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0780] In this invention, the server includes means for analyzing input voice or natural language data to extract keywords, means for analyzing the user's emotions, and means for searching a database for past cases and related information. This makes it possible to present countermeasures that take customer emotions into consideration and to improve the quality of service by utilizing accumulated emotional data.

[0781] "Voice or natural language data" refers to information in the form of conversations or texts entered by users, including customer requests and questions.

[0782] "Methods for analyzing and extracting keywords" refers to techniques that analyze input data and identify and pinpoint important words and phrases from it.

[0783] "Methods for analyzing user emotions" refer to technologies that evaluate voice tone and expression in order to determine the emotions contained in conversations and written texts.

[0784] "Means of searching past cases and related information from a database" refers to a technology that searches for and retrieves cases and information that are relevant to the current situation from a pre-existing database of cases and information.

[0785] "Means of acquiring external information in real time and reflecting it in analysis results" refers to technologies that acquire the latest data from the internet and other information sources, and incorporate and utilize it in the results of the analysis.

[0786] "Means for generating optimal advice or suggestions and outputting them to a display device" refers to a technology that creates the most effective advice or solutions based on analyzed information and provides them to the user through a display device.

[0787] "A means of accumulating emotional data acquired during a conversation and reflecting it in future responses" refers to a technology that saves emotionally-based information obtained from interactions with customers and utilizes it for future interactions.

[0788] This invention is a system that utilizes speech and natural language processing technologies to support customer service. The system combines speech-to-text conversion, natural language analysis, and sentiment analysis to provide responses tailored to the customer's emotions. Specific embodiments are described below.

[0789] Users enter inquiries and requests via voice or text. If voice input is used, the information is transmitted to the system via the device's microphone. Voice input is available on readily available devices and is designed with ease of use in mind.

[0790] The server converts the audio into text data using high-precision speech recognition software (e.g., a common speech recognition API). During this process, it accurately captures the subtle nuances of the speech and replaces them with textual information.

[0791] The obtained text data is analyzed using a natural language processing library (e.g., spaCy or other common NLU libraries). Sentence segmentation, keyword extraction, and contextual analysis are performed, thereby extracting important information.

[0792] Next, the server uses an emotion analysis engine (e.g., a common emotion analysis tool) to evaluate the customer's emotions. This allows the server to identify the customer's emotional state, specifically emotions such as joy, sadness, or dissatisfaction.

[0793] Based on the analyzed keywords and sentiment information, the server searches the database to retrieve past cases and related information. Furthermore, it uses external information such as weather and date / time, obtained in real time, to generate more appropriate advice.

[0794] The generated information is presented to the user visually through the terminal's user interface. At this time, the display device uses effective visual materials to provide information in a way that is easily understandable to the customer.

[0795] For example, if a customer expresses dissatisfaction with a product malfunction, the system identifies their feelings and suggests solutions based on past repair cases. This enables a quick and accurate response.

[0796] An example of a prompt message is, "Please enter the customer's voice. The system will analyze their emotions and suggest the best course of action." This encourages users to utilize the system. This technology can significantly improve the quality of customer service and increase customer satisfaction.

[0797] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0798] Step 1:

[0799] Users input their inquiries via voice or text using a terminal. In the case of voice input, the microphone captures the audio and sends it to the system. The voice and text become input data for the system. The voice data is converted into a digital signal upon receipt, ready for processing.

[0800] Step 2:

[0801] The server converts the input audio data into text using speech recognition software. The input is audio data, and the output is text data. In this process, the prosody and pronunciation of the speech are analyzed as features, and highly accurate text conversion is performed based on that information.

[0802] Step 3:

[0803] The server analyzes the transcribed data using a natural language processing library, which is a natural language processing module. The input is text data, and the output is extracted keywords and contextual information. The process involves segmenting sentences, tagging parts of speech, and performing dependency analysis to identify keywords.

[0804] Step 4:

[0805] The server uses an emotion analysis engine to evaluate the user's emotions from text data. The input is analyzed text information, and the output is an emotion score. In this step, emotional features are calculated from the text representation, and emotions such as positive, negative, and neutral are quantified.

[0806] Step 5:

[0807] The server searches the database using keywords and sentiment scores. The input is the identified keywords and sentiment scores, and the output is similar cases and related information. This process involves issuing SQL queries to retrieve past interaction history and related solutions.

[0808] Step 6:

[0809] The server retrieves real-time data from external sources and integrates it into the search results. This includes utilizing APIs to obtain weather, time, and geographical information. The input is the initial search results, and the output is enhanced suggestion information.

[0810] Step 7:

[0811] The terminal visually presents the generated suggestion information to the user through a user interface. The input is integrated suggestion information, and the output is the final display to the user. The terminal provides information visually using infographics and text.

[0812] Step 8:

[0813] The server records the sentiment data and search information obtained during processing and stores it in a database. The input is all the analysis information, and the output is the data accumulated for future use. This data is used to improve the quality of responses in subsequent interactions.

[0814] (Application Example 2)

[0815] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0816] In today's living environment, there is a demand for sophisticated services tailored to individual emotions and physical conditions. In particular, real-time emotional recognition and feedback are essential to improving the quality of communication within the home and ensuring smooth daily living support. However, conventional home support systems currently struggle to accurately interpret users' emotional states, making it difficult to provide individualized support based on these perceptions.

[0817] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0818] In this invention, the server includes means for analyzing input natural language data and extracting keywords, means for evaluating the user's emotional state from speech or text using an emotion analysis engine, and means for playing appropriate sound or visual content based on the user's emotional state. This enables customized responses and service provision tailored to the individual's emotional state.

[0819] "Natural language data" refers to information expressed in the language format that humans use on a daily basis.

[0820] A "data bank" is an information recording device used to store past information and records of similar cases.

[0821] "External environmental information" refers to information about conditions such as weather and time that exist outside the system.

[0822] An "information display device" is a device used to visually present information to a user.

[0823] "Relevant departments" refer to organizations that possess the necessary knowledge and functions to operate the system and address unresolved questions.

[0824] An "emotion analysis engine" is an analytical tool used to evaluate emotional states from speech and text.

[0825] "User's emotional state" refers to the user's current psychological or emotional condition.

[0826] "Audio and visual content" refers to forms of information or entertainment that utilize sound and images.

[0827] This invention applies a customer service support system utilizing voice and natural language processing to a consumer robot used in the home. It primarily uses an emotion recognition engine and real-time database referencing capabilities to provide audio and visual content based on an individual's emotional state. A specific implementation example of the system is shown below.

[0828] The server converts the input speech into text data using high-precision recognition technology. The hardware used here is a Raspberry Pi, and the software utilizes the Google Cloud Speech-to-Text API. The converted text data is then analyzed using natural language processing tools such as NLTK and SpaCy to extract keywords.

[0829] Next, the server uses IBM Watson Tone Analyzer, an emotion analysis engine, to evaluate the user's emotions from their voice tone and text. This makes it possible to determine the user's emotional state.

[0830] Based on the emotional state, the device selects and plays appropriate audio and visual content. Available media include music, videos, and animation.

[0831] When a user makes an inquiry or request by voice, they can say something like, "I'm feeling tired today and would like to relax a bit," and the system will select and play relaxing music.

[0832] Also, an example of a prompt statement is:

[0833] "Please advise on how to soothe users' minds when they want to relax."

[0834] "I would appreciate your advice on the best way to deal with a family member who is experiencing stress."

[0835] This allows for responses tailored to the user's emotional state, improving satisfaction and convenience within the home.

[0836] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0837] Step 1:

[0838] The user speaks to the home robot using their voice. The input here is the user's voice data. The server captures the voice using a microphone on a Raspberry Pi and converts that data to text via the Google Cloud Speech-to-Text API. This conversion process yields the text data.

[0839] Step 2:

[0840] The server analyzes the obtained text data using natural language processing tools such as NLTK and SpaCy. The input is the text data converted in step 1. This process extracts keywords and context, and a list of relevant keywords is output.

[0841] Step 3:

[0842] The server performs sentiment analysis on text data and voice tone using IBM Watson Tone Analyzer. The inputs are the voice tone acquired in step 1 and the text data from step 2. Based on the analysis, the user's emotional state is evaluated, and emotion tags such as anger, joy, and sadness are output.

[0843] Step 4:

[0844] The server, based on emotional state data, references a historical database to select the most appropriate audio or visual content. In this step, emotional tags and keyword lists are inputs. The most suitable content is selected from past history, and that information is output.

[0845] Step 5:

[0846] The device plays the selected audio or visual content. The content information obtained in step 4 is used as input. Using the speaker or display, music, animation, or video that matches the user's emotional state is presented to the user. This completes the provision of a service that is attentive to the user's emotions.

[0847] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0848] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0849] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0850] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0851] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0852] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0853] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0854] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0855] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0856] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0857] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0858] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0859] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0860] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0861] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0862] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using this memory.

[0863] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0864] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0865] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0866] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0867] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0868] The following is further disclosed regarding the embodiments described above.

[0869] (Claim 1)

[0870] A method for analyzing input natural language data and extracting keywords,

[0871] A means of searching for past information and similar cases from a database,

[0872] A means of acquiring external environmental information in real time and reflecting it in the analysis results,

[0873] A means of generating the optimal answer or advice and displaying it on the device,

[0874] A means of reporting unresolved questions to the relevant departments and managing feedback,

[0875] A system that includes this.

[0876] (Claim 2)

[0877] The system according to claim 1, comprising means for converting audio data into text data.

[0878] (Claim 3)

[0879] The system according to claim 1, further comprising means for displaying related images or graphs on a terminal.

[0880] "Example 1"

[0881] (Claim 1)

[0882] A means for analyzing input natural language information and extracting key words,

[0883] A means of searching for past information and similar cases from the information database,

[0884] A means of acquiring current external environmental information and reflecting it in the analysis results,

[0885] Means for generating the optimal answer or advice and displaying it on an output device,

[0886] A means of reporting unresolved inquiries to the relevant departments and managing response information,

[0887] A means of converting audio information into text information,

[0888] A system that includes this.

[0889] (Claim 2)

[0890] The system according to claim 1, comprising means for presenting relevant visual information or statistical displays to a terminal.

[0891] (Claim 3)

[0892] The system according to claim 1, comprising an analysis means for recognizing the intent of an inquiry.

[0893] "Application Example 1"

[0894] (Claim 1)

[0895] A method for analyzing input natural language data and extracting keywords,

[0896] A means of searching for past information and similar cases from a database,

[0897] A means of acquiring external environmental information in real time and reflecting it in the analysis results,

[0898] A means of generating the optimal answer or advice and displaying it on the device,

[0899] A means of reporting unresolved questions to the relevant departments and managing feedback,

[0900] A means of obtaining location information within a store and providing inventory information and discount information,

[0901] A system that includes this.

[0902] (Claim 2)

[0903] The system according to claim 1, comprising means for converting audio data into text data.

[0904] (Claim 3)

[0905] The system according to claim 1, further comprising means for displaying related images or graphs on a terminal.

[0906] "Example 2 of combining an emotion engine"

[0907] (Claim 1)

[0908] A means for analyzing input speech or natural language data to extract keywords,

[0909] A means of analyzing user emotions,

[0910] A means of searching past cases and related information from a database,

[0911] A means of acquiring external information in real time and reflecting it in the analysis results,

[0912] Means for generating optimal advice or suggestions and outputting them to a display device,

[0913] A means of accumulating emotional data acquired during dialogue and reflecting it in future responses,

[0914] A system that includes this.

[0915] (Claim 2)

[0916] The system according to claim 1, comprising means for converting audio data into text data.

[0917] (Claim 3)

[0918] The system according to claim 1, further comprising means for outputting related visual materials to a display device.

[0919] "Application example 2 when combining with an emotional engine"

[0920] (Claim 1)

[0921] A method for analyzing input natural language data and extracting keywords,

[0922] A means of searching past information and similar cases from a database,

[0923] A means of acquiring external environmental information in real time and reflecting it in the analysis results,

[0924] Means for generating the optimal answer or advice and displaying it on an information display device,

[0925] A means of reporting unresolved questions to the relevant departments and managing their opinions,

[0926] A means for evaluating a user's emotional state from speech or text using an emotion analysis engine,

[0927] A means of playing appropriate sound and visual content based on the user's emotional state,

[0928] A system that includes this.

[0929] (Claim 2)

[0930] The system according to claim 1, comprising means for converting audio data into text data.

[0931] (Claim 3)

[0932] The system according to claim 1, further comprising means for displaying related images and charts on an information display device. [Explanation of Symbols]

[0933] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A method for analyzing input natural language data and extracting keywords, A means of searching for past information and similar cases from a database, A means of acquiring external environmental information in real time and reflecting it in the analysis results, A means of generating the optimal answer or advice and displaying it on the device, A means of reporting unresolved questions to the relevant departments and managing feedback, A system that includes this.

2. The system according to claim 1, comprising means for converting audio data into text data.

3. The system according to claim 1, further comprising means for displaying related images and graphs on a terminal.