system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system addresses the challenge of real-time emotional analysis and advice provision by integrating analysis, generation, and presentation means, offering personalized support and continuous improvement through user feedback.

JP2026096518APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

AI Technical Summary

Technical Problem

Existing systems struggle to accurately analyze users' emotional states in real-time and provide personalized advice, lacking effective methods for continuous improvement using user feedback.

Method used

A system comprising an analysis means for real-time emotional state analysis, a generation means for personalized advice, a presentation means for advice delivery, and a voice acquisition and recording means for historical data utilization, ensuring precise and consistent support across various environments.

Benefits of technology

The system provides personalized advice tailored to users' emotional states in real-time, enhancing resilience and stress management by continuously improving through user feedback.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096518000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] An analytical method for analyzing the emotional state of users, A generation means that generates advice to the user based on the analysis means, Presentation means for presenting the aforementioned advice through various devices, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern society, many individuals are affected by stress and self-negativity, which significantly reduces their mental health and the quality of daily life. In particular, when the tolerance to failure and criticism is weak, it leads to persistent anxiety and pressure. The present invention aims to improve the resilience of such individuals and provide effective support to make them more stress-resistant.

Means for Solving the Problems

[0005] This invention provides a system that includes an analysis means for analyzing a user's emotional state in real time, and a generation means for generating personalized advice for the user based on the results. Furthermore, by including a presentation means for presenting advice using various devices, the system ensures that users receive consistent support in various environments. In addition, by including a voice acquisition means for acquiring the user's voice information and supplying it for analysis, and a recording means for recording the user's history and utilizing it in creating advice in the generation means, the system achieves more precise and effective support.

[0006] "Analysis means" refers to a device or software that has the function of analyzing the user's emotional state in real time and making a judgment about that state.

[0007] "Generation means" refers to a device or software that has the function of generating appropriate advice for the user based on information about the emotional state obtained by the analysis means.

[0008] "Presentation means" refers to a device or software that has the function of presenting generated advice to the user through various devices.

[0009] "Voice acquisition means" refers to a device or software that has the function of acquiring the user's voice information and supplying that information to an analysis means.

[0010] "Recording means" refers to a device or software that has the function of recording user history information and utilizing it as reference for creating advice in the generation means. [Brief explanation of the drawing]

[0011] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0012] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0013] First, let's explain the terminology used in the following explanation.

[0014] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0015] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0016] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0017] In the following embodiments, the numbered communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), etc.

[0018] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0019] [First Embodiment]

[0020] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0021] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0022] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0023] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0024] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0025] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0026] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0027] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0028] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0029] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0030] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0031] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0032] As an embodiment for carrying out the present invention, a system that performs specific program processing is designed. Its specific details are described below.

[0033] The server plays a central role in analyzing the user's emotional state. First, it receives the user's voice data transmitted from the terminal and converts it into text data using speech recognition technology. Based on this text data, the server uses an emotion analysis model to analyze the user's emotional state. Based on this emotional state, it has a process to generate appropriate advice for the user. The generated advice and information are sent to the terminal to which the user is connected.

[0034] The terminal acts as an interface between the server and the user. It has a voice acquisition function to capture the sound around the user in real time and send it to the server. The advice returned from the server is presented to the user as either audio or text. The terminal also contributes to the improvement of the overall system by collecting feedback from the user about the acquired advice and sending it to the server.

[0035] Users utilize the device during their daily activities and receive advice and support from the server. For example, if a user is feeling down after making a mistake at work, they can speak into the device, and their voice is sent to the server for analysis, after which appropriate encouragement and advice are provided. Through this process, users can objectively re-examine their own emotions and are encouraged to take positive action.

[0036] Thus, this system aims to support the improvement of users' resilience and assist in stress management in daily life by coordinating the operation of its various components, such as analysis means, generation means, and presentation means.

[0037] The following describes the processing flow.

[0038] Step 1:

[0039] The device collects the user's speech through the microphone. The audio data is processed through a noise-canceling filter and temporarily stored as clear audio data.

[0040] Step 2:

[0041] The terminal prepares to send the stored audio data to the server. It packets the data according to the communication protocol and sends it to the server via a secure line.

[0042] Step 3:

[0043] The server processes the audio data received from the terminal through a speech recognition system and analyzes it as text data. The analyzed text data is then input into an emotion analysis model to determine the user's emotional state.

[0044] Step 4:

[0045] Based on the sentiment analysis results, the server generates personalized advice according to pre-configured rules and algorithms. The generated advice is customized considering the user's past behavioral history.

[0046] Step 5:

[0047] The server converts the generated advice into text or audio format and sends it to the terminal. The data is encrypted and processed to ensure user privacy.

[0048] Step 6:

[0049] The terminal receives advice from the server and presents it to the user. If the advice is presented verbally, it is read aloud through the speaker; if it is presented as text, it is displayed on the screen.

[0050] Step 7:

[0051] Users adjust their actions based on the advice provided. If necessary, they can send feedback via their device, and the results will be used for future support.

[0052] (Example 1)

[0053] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0054] In modern society, there is a need for systems that effectively provide advice to users regarding the emotional stress and challenges they face. However, current technology makes it difficult to accurately analyze a user's emotional state and provide personalized advice in real time. Furthermore, there are insufficient methods for continuously improving the system using user feedback. As a result, there is a need to develop a system that provides truly useful support for users.

[0055] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0056] In this invention, the server includes computational processing means for analyzing the user's emotional state, data generation means for generating advice for the user, and information presentation means for presenting the advice via various communication devices. This makes it possible to provide appropriate advice based on the user's emotional state in real time and to continuously improve the system's performance through user feedback.

[0057] "Users" refers to individuals or organizations that use the system to receive services or support.

[0058] "Emotional state" refers to the psychological or emotional state that a user experiences at a particular point in time.

[0059] "Computational processing means" refers to a computer system or software that handles the process of analyzing data.

[0060] "Data generation means" refers to technical means for generating new information or advice based on analyzed information.

[0061] "Information presentation means" refers to devices or interfaces used to display or transmit information generated by a system to users via sound.

[0062] "Feedback" refers to the evaluations and opinions that users give to a system or the advice provided, and is used for improvement.

[0063] "Adjustment methods" refer to methods and processes for optimizing the system's performance and offerings based on collected feedback.

[0064] "Voice input means" refers to devices and technologies for acquiring a user's voice as digital information and supplying it to a system.

[0065] "History information" refers to information that records data about a user's past actions and system usage.

[0066] "Information recording means" refers to technical devices or systems that store data about users and utilize it for subsequent processing and analysis.

[0067] One embodiment of this invention is a system designed to facilitate communication between a server, a terminal, and a user.

[0068] The server primarily functions as a computational processing and data generation means. The server receives audio data from terminals via the internet and converts it into text data using speech recognition software. A common speech recognition API is used for this process. The converted text data is analyzed using a natural language processing model. This includes using BERT or similar models for sentiment analysis. Based on the analysis results, a generative AI model functions as a data generation means, generating advice for the user. The prompt for this generation process is set to "Specifically analyze the sentiment of this text and generate appropriate advice."

[0069] The terminal functions as both a voice input and information presentation device. A typical smart device is used, which includes a microphone for voice acquisition and a monitor or speaker for displaying user advice. The terminal acquires the user's spoken voice in real time and transmits it to the server. The received advice is presented to the user in text or audio format. Speech synthesis software may be used in this process.

[0070] Users utilize the system's functions through their devices in everyday situations. For example, if a user says to their device, "I've been feeling stressed lately," the server converts the audio into text and generates and returns advice on stress reduction. An example of such advice might be, "I recommend taking some time to relax. How about spending some time on a hobby?"

[0071] The ultimate goal of this system is to provide personalized advice tailored to the user's emotional state and improve their quality of life.

[0072] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0073] Step 1:

[0074] The device acquires audio from the user's surroundings in real time. This is done using a built-in microphone, and an audio acquisition application captures the data as a digital signal. The input is the user's spoken natural language, and the output is a digitized audio file.

[0075] Step 2:

[0076] The terminal transmits the acquired audio data to the server via the internet. The audio data is divided into packets and efficiently transferred over the network. The input is a digital audio file, and the output is the audio data that reaches the server.

[0077] Step 3:

[0078] The server converts the received audio data into text data using speech recognition software. Specifically, the speech recognition engine analyzes the audio waveform to generate an appropriate text representation. The input is the audio data sent to the server, and the output is the converted text data.

[0079] Step 4:

[0080] The server inputs the converted text data into a natural language processing model to analyze the user's emotional state. In this process, it infers emotions from sentence structure and word choices, and generates the results of the emotional analysis as output. The input is text data, and the output is data with the emotional state analyzed.

[0081] Step 5:

[0082] The server uses a generative AI model to generate advice based on the results of sentiment analysis. The prompt is set to "Provide appropriate advice based on the sentiment state of this text," and the model generates the best advice. The input is the sentiment analysis data, and the output is the text of the generated advice.

[0083] Step 6:

[0084] The server sends the generated advice to the terminal. It sends encoded data to ensure stable communication over the network. The input is the text of the generated advice, while the output is the advice text sent to the terminal.

[0085] Step 7:

[0086] The terminal presents the received advice to the user by converting it into speech using speech synthesis software or by displaying it as text on the screen. Information is delivered to the user through sight and sound. The input is the advice text from the server, and the output is the advice information conveyed to the user.

[0087] Step 8:

[0088] The user inputs feedback on the presented advice into the terminal. This can be done using voice or text input. The terminal sends this feedback to the server, contributing to system improvement. The input is the user's feedback, and the output is the feedback data sent to the server.

[0089] (Application Example 1)

[0090] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0091] In modern society, the elderly and those requiring care often experience psychological burdens due to loneliness and emotional fluctuations. To alleviate these burdens and improve well-being in daily life, there is a need for technology that can grasp emotional changes in real time and provide appropriate responses. However, conventional systems have the challenge of being unable to accurately analyze the emotional state of users and provide appropriate responses based on that analysis.

[0092] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0093] In this invention, the server includes an analysis means for analyzing the user's emotional state, a generation means for generating advice for the user based on the analysis means, and a presentation means for presenting the advice through various devices and further recommending music and information in accordance with changes in emotion. This makes it possible to accurately analyze the user's emotions, provide personalized advice and recommendations in real time, reduce the user's psychological burden, and improve their well-being.

[0094] "Analysis means for analyzing the emotional state of users" refers to a device that includes technology for converting audio data into text data and analyzing the user's emotions based on that text data.

[0095] A "generation means" is a device that has the function of generating appropriate advice and information for the user based on the analysis results.

[0096] A "presentation means" is a device that has the function of transmitting generated advice and information to the user through various devices, and also has the ability to recommend music and information in response to changes in emotions.

[0097] "Voice acquisition means" refers to a device that includes technology for acquiring a user's voice data in real time and transmitting it to a server for analysis.

[0098] "Processes adapted for emotional support in caregiving situations" refers to analytical and response processes designed to reduce psychological burden, particularly for the elderly and individuals requiring care.

[0099] "Historical data" refers to a collection of data that includes records of a user's past behavior and emotions, and is a source of information that is considered when creating advice and information.

[0100] "Methods for collecting feedback and improving the system" refers to the process of collecting user reactions and evaluations and using them to improve and optimize the entire system.

[0101] This invention is a system aimed at reducing the psychological burden on the elderly and individuals requiring care. The system includes analysis means, generation means, presentation means, voice acquisition means, history data recording means, and feedback collection means. The invention is specifically implemented through the roles of server, terminal, and user.

[0102] The server first receives the user's voice data transmitted from the terminal via a voice acquisition device. The hardware used at this stage is a smartphone or tablet, and software such as Google® Speech-to-Text API is used to convert the voice data into text data.

[0103] Next, the converted text data is analyzed using an emotion analysis model to analyze the user's emotional state. Here, a Python natural language processing library (e.g., Transformers) is utilized. For example, if the user says, "Nobody came to visit today, and I'm lonely," the server will perform an appropriate emotion analysis based on this information.

[0104] The generation mechanism generates appropriate advice and information for the user based on the analysis results. The generated advice is then sent back to the terminal via the server. At this stage, music or information tailored to the user may also be recommended.

[0105] The device uses its screen and speakers to present advice and music to the user through various means. It also collects user feedback and sends this information to a server. This continuous feedback allows for improvements to the entire system.

[0106] For example, if a user says, "I'm a little tired," the generative AI model might offer advice such as, "I'll play your favorite music to help you relax."

[0107] Example prompt: "Analyze user feedback to understand their emotions and generate optimal advice."

[0108] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0109] Step 1:

[0110] The user provides voice input. The terminal collects the user's voice using a voice acquisition device. This voice data is then input into subsequent processing.

[0111] Step 2:

[0112] The device sends the collected audio data to the server. The server uses the Google Speech-to-Text API to convert the audio data into text data. Audio data is input, and the corresponding text is output.

[0113] Step 3:

[0114] The server inputs the converted text data into an emotion analysis model. The analysis tool then performs emotion analysis on this text data. The input is text data, and the output is information indicating the user's emotional state. A Python natural language processing library is used.

[0115] Step 4:

[0116] The server uses a generative AI model to generate appropriate advice based on the analyzed emotional state. A prompt sentence (e.g., "Analyze the user's emotions and generate optimal advice") is formed based on the analysis results, and this is input to output the advice text.

[0117] Step 5:

[0118] The generated advice text is sent from the server to the terminal. The terminal communicates this advice to the user via a presentation device, either through a screen display or audio output. The terminal's output is the generated advice message.

[0119] Step 6:

[0120] Users provide feedback on the advice they receive. This feedback is sent from the terminal to the server and used to improve the system. The server uses the user's feedback data as input to update the system's algorithms and analysis models.

[0121] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0122] In an embodiment for carrying out the present invention, a system incorporating an emotion engine has been designed to highly analyze the user's emotional state and provide support tailored to individual needs. The embodiments thereof will be described in detail below.

[0123] The server functions as a central data processing unit, receiving audio data transmitted from terminals. This audio data is converted into text data by a speech recognition module within the server. The text data is then fed into the emotion engine, which analyzes the user's emotional state in more detail. The emotion engine uses specific algorithms and has its own methods for evaluating the user's emotions in real time. This evaluation result is passed to a generation mechanism within the server, which constructs advice tailored to the individual's emotional state.

[0124] The terminal acts as the interface between the server and the user. A microphone installed in the terminal constantly captures ambient sound and transmits this audio data to the server. Advice returned from the server is presented to the user through the terminal's display or speaker. This allows the user to receive emotion-based feedback in real time.

[0125] Users can utilize this system in their daily lives and receive support in various situations. For example, when a user is preparing a presentation for work, they can speak into the device to express their anxiety. The server receives this audio and, using its emotion engine, analyzes that the user is experiencing high levels of anxiety. Based on this, the server generates specific advice such as, "Take a deep breath and relax. You have successfully delivered excellent presentations in the past," and provides it to the user through the device. Through this process, users can gain a sense of security and achieve emotional stability.

[0126] Thus, the system of the present invention aims to enhance the user's resilience and improve their ability to cope with daily life stress by linking analysis means, generation means, presentation means, and an emotion engine.

[0127] The following describes the processing flow.

[0128] Step 1:

[0129] The device collects the user's voice using a microphone and performs noise cancellation processing to eliminate ambient noise. This results in the extraction of clear audio data.

[0130] Step 2:

[0131] The terminal prepares to send the processed voice data to the server by packetizing the data and sending it through a secure communication channel.

[0132] Step 3:

[0133] The server converts the audio data received from the terminal into text data using a speech recognition module. The text data is then prepared for further sentiment analysis.

[0134] Step 4:

[0135] The server utilizes an emotion engine to analyze the user's emotional state in detail from text data. During this process, it assigns positive, negative, or neutral emotion tags.

[0136] Step 5:

[0137] The server generates the most suitable advice for the user based on emotion tags obtained by the emotion engine. The algorithm creates customized advice while also considering the user's history.

[0138] Step 6:

[0139] The server encodes the generated advice as a data packet, encrypts it, and then sends it to the terminal.

[0140] Step 7:

[0141] The terminal decodes the advice sent from the server and presents it to the user in audio or text format. The user can understand the information by playing the audio through the speaker or displaying the text on the screen.

[0142] Step 8:

[0143] Users adjust their actions and feelings based on the advice provided. This user feedback is then sent to the server via the device, contributing to improvements in the system's accuracy.

[0144] (Example 2)

[0145] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0146] In today's world, it is increasingly important to quickly and accurately understand users' emotional states and provide real-time feedback tailored to their individual needs. However, conventional systems often lack the accuracy to analyze users' emotions and the quality of the advice they provide. Therefore, there is a need for technology that can analyze emotions with higher precision and provide accurate, personalized advice.

[0147] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0148] In this invention, the server includes information processing means for analyzing the user's emotional state, data generation means for generating advice for the user based on the information processing means, information output means for presenting the advice via a communication device, data acquisition means for acquiring voice data and supplying it to the information processing means, and information analysis means for evaluating the user's emotions in real time. This makes it possible to analyze the user's emotions in detail and provide appropriate feedback in real time.

[0149] An "information processing device" is a component that has the function of analyzing the emotional state of a user.

[0150] A "data generation means" is a component that has the function of generating advice for the user based on the analysis results obtained from the information processing means.

[0151] An "information output means" is a component that has the function of presenting advice generated via a communication device to the user.

[0152] A "data acquisition means" is a component that has the function of acquiring audio data and supplying it to an information processing means.

[0153] The "information analysis tool" is a component that has the function of analyzing the user's voice data in real time and performing sentiment evaluation.

[0154] This invention aims to realize a system that highly analyzes the emotional state of users and provides appropriate feedback tailored to each user. The system is configured as follows:

[0155] The server functions as the system's central information processing unit, receiving voice data transmitted from terminals. Using a speech recognition module, the voice data is converted into text data. The converted text data is then analyzed by an emotion engine to provide a detailed assessment of the user's emotional state. A specific algorithm is applied to this assessment, enabling real-time analysis of the user's emotions.

[0156] The analysis results from the emotion engine are then generated by a generative AI model as advice tailored to the user's emotions. This advice is based on historical data and general psychology, and is designed to provide specific and helpful guidance.

[0157] The terminal functions as an information interface between the server and the user. The microphone built into the terminal acquires voice data and transmits it to the server. It also provides information to the user by displaying advice sent from the server on the display or playing it back as audio through the speaker.

[0158] Users can utilize this system in their daily lives. For example, if a user feels anxious about speaking in front of a large group, they can say to the device, "I get very nervous when speaking in front of a large group. How can I relax?" The server analyzes this voice, generates advice to alleviate anxiety, and provides the user with specific suggestions such as, "Take a deep breath and relax. You have successfully given great presentations in the past." In this way, users can gain a sense of security and emotional stability.

[0159] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0160] Step 1:

[0161] The device continuously captures ambient sound using a microphone. When a user speaks, it converts the audio data into a digital format and sends it to the server. The input in this step is the user's voice, and the output is the digital audio data sent to the server. Specifically, as soon as audio input is detected, the data is sampled and transferred to the server as data packets.

[0162] Step 2:

[0163] The server receives audio data transmitted from the terminal. It uses a speech recognition module to convert this data into text. The input is the audio data obtained from the terminal, and the output is the converted text data. A speech recognition algorithm is applied to this data processing, analyzing the features of the audio waveform and converting them into text.

[0164] Step 3:

[0165] The server passes the converted text data to the emotion engine. The emotion engine analyzes the emotional state from the text data using a specific algorithm. The input is the text data obtained in step 2, and the output is the analysis result indicating the user's emotional state. Specifically, it utilizes an emotion dictionary and natural language processing techniques to evaluate emotions based on keywords in the text.

[0166] Step 4:

[0167] The server inputs the analysis results into a generating AI model to produce advice tailored to the user's emotional state. This step involves data calculations that take into account past data and general psychological knowledge. The input is the emotion analysis result, and the output is the generated advice message. Specifically, the AI model utilizes its trained knowledge to construct appropriate advice based on various scenarios.

[0168] Step 5:

[0169] The server sends the generated advice to the terminal. The terminal converts the received data into an appropriate format to return this advice to the user, displaying it on the screen or playing it through the speaker. Here, the input is the advice data from the server, and the output is the real-time advice presented to the user. Specifically, the received message is processed as text or audio and presented in a way that the user can immediately understand.

[0170] (Application Example 2)

[0171] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0172] Modern households require technological advancements to improve people's quality of life. In particular, there is a lack of systems that can adequately understand family members' emotional states and provide real-time support. As a result, people often do not receive sufficient emotional support in their daily lives, leading to a buildup of stress.

[0173] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0174] In this invention, the server includes an analysis means for analyzing the emotional state of the user, a generation means for generating advice for the user based on the analysis means, a presentation means for presenting the advice via various devices, a monitoring means for understanding the emotional state of people in the home environment, and a support means for providing life support based on the information obtained using the monitoring means. This enables people in the home to receive support appropriate to their emotional state, thereby improving their quality of life.

[0175] "Users" refers to individuals or households who use this system.

[0176] "Emotional state" refers to the psychological or emotional state that a user is experiencing at a particular point in time, and this is usually information that is analyzed from voice, facial expressions, etc.

[0177] "Analysis means" refers to a device that has the function of analyzing the emotional state of the user from the acquired voice and data.

[0178] A "generation means" is a device that has the function of generating optimal advice and support for the user based on emotional information obtained by an analysis means.

[0179] "Presentation means" refers to devices or methods for displaying or notifying users of generated advice in a form that is recognizable to them.

[0180] "Surveillance devices" are those that have the function of observing people's behavior and voices within the home and acquiring necessary data.

[0181] "Support measures" refer to functions that provide support to improve quality of life in real time using analyzed emotional data and monitored information.

[0182] In implementing this invention, the system mainly consists of a server and a robot terminal located in the home. The server functions as a central processing unit and is responsible for processing the voice data transmitted from the terminal. The terminal is equipped with a microphone and speaker to acquire the user's voice and transmit it to the server.

[0183] The server receives the audio data and converts it into text data using the Google Cloud Speech-to-Text API. The text data is then subjected to sentiment analysis using IBM Watson®'s Natural Language Understanding to analyze the user's emotional state in detail. Based on the analyzed sentiment data, a generation tool, executed by a Python program, uses AI to create appropriate feedback and advice. This generated advice is then delivered to the user through the device's speaker.

[0184] For example, when a user returns home tired at the end of a busy day, the robot might suggest, "Were you able to rest enough today? Shall I prepare a warm drink for you to relax?" In this way, users can receive real-time feedback tailored to their emotional state.

[0185] Examples of prompts for an AI generation model include: "Based on words obtained from the user's voice, please come up with specific health advice to help them relax."

[0186] This allows for timely emotional feedback within the family, improving the quality of daily life.

[0187] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0188] Step 1:

[0189] The user speaks into the device. The device continuously acquires ambient sound using its built-in microphone and transmits this audio data to the server in real time. The input is the user's spoken voice, which is converted into audio data.

[0190] Step 2:

[0191] The server converts the received audio data into text data using the Google Cloud Speech-to-Text API. This step extracts words and phrases from the audio and converts them into text, making them available for use in the next step. The output is the audio converted to text.

[0192] Step 3:

[0193] The server feeds the converted text data to IBM Watson's Natural Language Understanding to perform sentiment analysis. This process extracts emotional components from the context and words of the input text, determining emotional states such as positive, negative, and neutral. The output includes emotional scores and states.

[0194] Step 4:

[0195] The server generates feedback and advice using a Python program based on the sentiment analysis results. The generative AI model selects appropriate responses using prompts and creates optimal advice from a list to provide to the user. The output is the text of the specific advice.

[0196] Step 5:

[0197] The server sends the generated advice back to the terminal. The terminal then conveys that advice to the user verbally through its speaker. The expected output is words or actions that represent natural feedback from the user.

[0198] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0199] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0200] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0201] [Second Embodiment]

[0202] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0203] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0204] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0205] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0206] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0207] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0208] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0209] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0210] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0211] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0212] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0213] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0214] As an embodiment for carrying out the present invention, a system that performs specific program processing is designed. Its specific details are described below.

[0215] The server plays a central role in analyzing the user's emotional state. First, it receives the user's voice data transmitted from the terminal and converts it into text data using speech recognition technology. Based on this text data, the server uses an emotion analysis model to analyze the user's emotional state. Based on this emotional state, it has a process to generate appropriate advice for the user. The generated advice and information are sent to the terminal to which the user is connected.

[0216] The terminal acts as an interface between the server and the user. It has a voice acquisition function to capture the sound around the user in real time and send it to the server. The advice returned from the server is presented to the user as either audio or text. The terminal also contributes to the improvement of the overall system by collecting feedback from the user about the acquired advice and sending it to the server.

[0217] Users utilize the device during their daily activities and receive advice and support from the server. For example, if a user is feeling down after making a mistake at work, they can speak into the device, and their voice is sent to the server for analysis, after which appropriate encouragement and advice are provided. Through this process, users can objectively re-examine their own emotions and are encouraged to take positive action.

[0218] Thus, this system aims to support the improvement of users' resilience and assist in stress management in daily life by coordinating the operation of its various components, such as analysis means, generation means, and presentation means.

[0219] The following describes the processing flow.

[0220] Step 1:

[0221] The device collects the user's speech through the microphone. The audio data is processed through a noise-canceling filter and temporarily stored as clear audio data.

[0222] Step 2:

[0223] The terminal prepares to send the stored audio data to the server. It packets the data according to the communication protocol and sends it to the server via a secure line.

[0224] Step 3:

[0225] The server processes the audio data received from the terminal through a speech recognition system and analyzes it as text data. The analyzed text data is then input into an emotion analysis model to determine the user's emotional state.

[0226] Step 4:

[0227] Based on the sentiment analysis results, the server generates personalized advice according to pre-configured rules and algorithms. The generated advice is customized considering the user's past behavioral history.

[0228] Step 5:

[0229] The server converts the generated advice into text or audio format and sends it to the terminal. The data is encrypted and processed to ensure user privacy.

[0230] Step 6:

[0231] The terminal receives advice from the server and presents it to the user. If the advice is presented verbally, it is read aloud through the speaker; if it is presented as text, it is displayed on the screen.

[0232] Step 7:

[0233] Users adjust their actions based on the advice provided. If necessary, they can send feedback via their device, and the results will be used for future support.

[0234] (Example 1)

[0235] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0236] In modern society, there is a need for systems that effectively provide advice to users regarding the emotional stress and challenges they face. However, current technology makes it difficult to accurately analyze a user's emotional state and provide personalized advice in real time. Furthermore, there are insufficient methods for continuously improving the system using user feedback. As a result, there is a need to develop a system that provides truly useful support for users.

[0237] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0238] In this invention, the server includes computational processing means for analyzing the user's emotional state, data generation means for generating advice for the user, and information presentation means for presenting the advice via various communication devices. This makes it possible to provide appropriate advice based on the user's emotional state in real time and to continuously improve the system's performance through user feedback.

[0239] "Users" refers to individuals or organizations that use the system to receive services or support.

[0240] "Emotional state" refers to the psychological or emotional state that a user experiences at a particular point in time.

[0241] "Computational processing means" refers to a computer system or software that handles the process of analyzing data.

[0242] "Data generation means" refers to technical means for generating new information or advice based on analyzed information.

[0243] "Information presentation means" refers to devices or interfaces used to display or transmit information generated by a system to users via sound.

[0244] "Feedback" refers to the evaluations and opinions that users give to a system or the advice provided, and is used for improvement.

[0245] "Adjustment methods" refer to methods and processes for optimizing the system's performance and offerings based on collected feedback.

[0246] "Voice input means" refers to devices and technologies for acquiring a user's voice as digital information and supplying it to a system.

[0247] "History information" refers to information that records data about a user's past actions and system usage.

[0248] "Information recording means" refers to technical devices or systems that store data about users and utilize it for subsequent processing and analysis.

[0249] One embodiment of this invention is a system designed to facilitate communication between a server, a terminal, and a user.

[0250] The server primarily functions as a computational processing and data generation means. The server receives audio data from terminals via the internet and converts it into text data using speech recognition software. A common speech recognition API is used for this process. The converted text data is analyzed using a natural language processing model. This includes using BERT or similar models for sentiment analysis. Based on the analysis results, a generative AI model functions as a data generation means, generating advice for the user. The prompt for this generation process is set to "Specifically analyze the sentiment of this text and generate appropriate advice."

[0251] The terminal functions as both a voice input and information presentation device. A typical smart device is used, which includes a microphone for voice acquisition and a monitor or speaker for displaying user advice. The terminal acquires the user's spoken voice in real time and transmits it to the server. The received advice is presented to the user in text or audio format. Speech synthesis software may be used in this process.

[0252] Users utilize the system's functions through their devices in everyday situations. For example, if a user says to their device, "I've been feeling stressed lately," the server converts the audio into text and generates and returns advice on stress reduction. An example of such advice might be, "I recommend taking some time to relax. How about spending some time on a hobby?"

[0253] The ultimate goal of this system is to provide personalized advice tailored to the user's emotional state and improve their quality of life.

[0254] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0255] Step 1:

[0256] The device acquires audio from the user's surroundings in real time. This is done using a built-in microphone, and an audio acquisition application captures the data as a digital signal. The input is the user's spoken natural language, and the output is a digitized audio file.

[0257] Step 2:

[0258] The terminal transmits the acquired audio data to the server via the internet. The audio data is divided into packets and efficiently transferred over the network. The input is a digital audio file, and the output is the audio data that reaches the server.

[0259] Step 3:

[0260] The server converts the received audio data into text data using speech recognition software. Specifically, the speech recognition engine analyzes the audio waveform to generate an appropriate text representation. The input is the audio data sent to the server, and the output is the converted text data.

[0261] Step 4:

[0262] The server inputs the converted text data into a natural language processing model to analyze the user's emotional state. In this process, it infers emotions from sentence structure and word choices, and generates the results of the emotional analysis as output. The input is text data, and the output is data with the emotional state analyzed.

[0263] Step 5:

[0264] The server uses a generative AI model to generate advice based on the results of sentiment analysis. The prompt is set to "Provide appropriate advice based on the sentiment state of this text," and the model generates the best advice. The input is the sentiment analysis data, and the output is the text of the generated advice.

[0265] Step 6:

[0266] The server sends the generated advice to the terminal. It sends encoded data to ensure stable communication over the network. The input is the text of the generated advice, while the output is the advice text sent to the terminal.

[0267] Step 7:

[0268] The terminal presents the received advice to the user by converting it into speech using speech synthesis software or by displaying it as text on the screen. Information is delivered to the user through sight and sound. The input is the advice text from the server, and the output is the advice information conveyed to the user.

[0269] Step 8:

[0270] The user inputs feedback on the presented advice into the terminal. This can be done using voice or text input. The terminal sends this feedback to the server, contributing to system improvement. The input is the user's feedback, and the output is the feedback data sent to the server.

[0271] (Application Example 1)

[0272] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0273] In modern society, the elderly and those requiring care often experience psychological burdens due to loneliness and emotional fluctuations. To alleviate these burdens and improve well-being in daily life, there is a need for technology that can grasp emotional changes in real time and provide appropriate responses. However, conventional systems have the challenge of being unable to accurately analyze the emotional state of users and provide appropriate responses based on that analysis.

[0274] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0275] In this invention, the server includes an analysis means for analyzing the user's emotional state, a generation means for generating advice for the user based on the analysis means, and a presentation means for presenting the advice through various devices and further recommending music and information in accordance with changes in emotion. This makes it possible to accurately analyze the user's emotions, provide personalized advice and recommendations in real time, reduce the user's psychological burden, and improve their well-being.

[0276] "Analysis means for analyzing the emotional state of users" refers to a device that includes technology for converting audio data into text data and analyzing the user's emotions based on that text data.

[0277] A "generation means" is a device that has the function of generating appropriate advice and information for the user based on the analysis results.

[0278] A "presentation means" is a device that has the function of transmitting generated advice and information to the user through various devices, and also has the ability to recommend music and information in response to changes in emotions.

[0279] "Voice acquisition means" refers to a device that includes technology for acquiring a user's voice data in real time and transmitting it to a server for analysis.

[0280] "Processes adapted for emotional support in caregiving situations" refers to analytical and response processes designed to reduce psychological burden, particularly for the elderly and individuals requiring care.

[0281] "Historical data" refers to a collection of data that includes records of a user's past behavior and emotions, and is a source of information that is considered when creating advice and information.

[0282] "Methods for collecting feedback and improving the system" refers to the process of collecting user reactions and evaluations and using them to improve and optimize the entire system.

[0283] This invention is a system aimed at reducing the psychological burden on the elderly and individuals requiring care. The system includes analysis means, generation means, presentation means, voice acquisition means, history data recording means, and feedback collection means. The invention is specifically implemented through the roles of server, terminal, and user.

[0284] The server first receives the user's voice data transmitted from the terminal through the voice acquisition means. The hardware used at this stage is a smartphone, a tablet, etc., and software such as the Google Speech-to-Text API is used to convert the voice data into text data.

[0285] Next, the converted text data is analyzed using the analysis means to analyze the user's emotional state by means of an emotion analysis model. Here, a natural language processing library in Python (e.g., Transformers) is utilized. For example, when the user says "I'm a bit lonely today because no one has come to visit," the server performs appropriate emotion analysis based on this information.

[0286] The generation means generates appropriate advice and information for the user based on the analysis results. The generated advice is transmitted to the terminal via the server again. At this stage, corresponding music or information may also be recommended.

[0287] The terminal presents advice and music to the user through the presentation means using the terminal's screen and speaker. Also, the terminal collects feedback from the user and transmits that information to the server. Through this continuous feedback, the overall system is improved.

[0288] As a specific example, when the user says "A bit tired," the generation AI model provides advice such as "Let's play your favorite music to relax."

[0289] Example of a prompt sentence: "Analyze the emotion from the user's voice and generate the most appropriate advice."

[0290] The flow of the specific process in Application Example 1 will be described using FIG. 12.

[0291] Step 1:

[0292] The user provides voice input. The terminal collects the user's voice using a voice acquisition device. This voice data is then input into subsequent processing.

[0293] Step 2:

[0294] The device sends the collected audio data to the server. The server uses the Google Speech-to-Text API to convert the audio data into text data. Audio data is input, and the corresponding text is output.

[0295] Step 3:

[0296] The server inputs the converted text data into an emotion analysis model. The analysis tool then performs emotion analysis on this text data. The input is text data, and the output is information indicating the user's emotional state. A Python natural language processing library is used.

[0297] Step 4:

[0298] The server uses a generative AI model to generate appropriate advice based on the analyzed emotional state. A prompt sentence (e.g., "Analyze the user's emotions and generate optimal advice") is formed based on the analysis results, and this is input to output the advice text.

[0299] Step 5:

[0300] The generated advice text is sent from the server to the terminal. The terminal communicates this advice to the user via a presentation device, either through a screen display or audio output. The terminal's output is the generated advice message.

[0301] Step 6:

[0302] Users provide feedback on the advice they receive. This feedback is sent from the terminal to the server and used to improve the system. The server uses the user's feedback data as input to update the system's algorithms and analysis models.

[0303] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0304] In an embodiment for carrying out the present invention, a system incorporating an emotion engine has been designed to highly analyze the user's emotional state and provide support tailored to individual needs. The embodiments thereof will be described in detail below.

[0305] The server functions as a central data processing unit, receiving audio data transmitted from terminals. This audio data is converted into text data by a speech recognition module within the server. The text data is then fed into the emotion engine, which analyzes the user's emotional state in more detail. The emotion engine uses specific algorithms and has its own methods for evaluating the user's emotions in real time. This evaluation result is passed to a generation mechanism within the server, which constructs advice tailored to the individual's emotional state.

[0306] The terminal acts as the interface between the server and the user. A microphone installed in the terminal constantly captures ambient sound and transmits this audio data to the server. Advice returned from the server is presented to the user through the terminal's display or speaker. This allows the user to receive emotion-based feedback in real time.

[0307] Users can utilize this system in their daily lives and obtain support in various scenarios. For example, when a user is preparing for a presentation at work and expresses their unease by speaking to the terminal, the server receives this voice, and the emotion engine analyzes that the state is one of strong unease. Based on this, the server generates specific advice such as "Take a deep breath and relax. You have also had successful presentations in the past." and provides it to the user through the terminal. Through such a process, the user can gain a sense of security and achieve emotional stability.

[0308] In this way, the system of the present invention aims to strengthen the resilience of users and improve the response to stress in daily life by coordinating the analysis means, generation means, presentation means, and in addition, the emotion engine.

[0309] The following will explain the processing flow.

[0310] Step 1:

[0311] The terminal collects the user's voice with a microphone and performs noise cancellation processing to eliminate ambient noise. As a result, clear voice data is extracted.

[0312] Step 2:

[0313] The terminal prepares to transmit the processed voice data to the server by packetizing the data and sending it through a secure communication channel to the server.

[0314] Step 3:

[0315] The server converts the voice data received from the terminal into text data using a voice recognition module. The text data is prepared for further emotion analysis.

[0316] Step 4:

[0317] The server utilizes an emotion engine to analyze the user's emotional state in detail from text data. During this process, it assigns positive, negative, or neutral emotion tags.

[0318] Step 5:

[0319] The server generates the most suitable advice for the user based on emotion tags obtained by the emotion engine. The algorithm creates customized advice while also considering the user's history.

[0320] Step 6:

[0321] The server encodes the generated advice as a data packet, encrypts it, and then sends it to the terminal.

[0322] Step 7:

[0323] The terminal decodes the advice sent from the server and presents it to the user in audio or text format. The user can understand the information by playing the audio through the speaker or displaying the text on the screen.

[0324] Step 8:

[0325] Users adjust their actions and feelings based on the advice provided. This user feedback is then sent to the server via the device, contributing to improvements in the system's accuracy.

[0326] (Example 2)

[0327] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0328] In today's world, it is increasingly important to quickly and accurately understand users' emotional states and provide real-time feedback tailored to their individual needs. However, conventional systems often lack the accuracy to analyze users' emotions and the quality of the advice they provide. Therefore, there is a need for technology that can analyze emotions with higher precision and provide accurate, personalized advice.

[0329] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0330] In this invention, the server includes information processing means for analyzing the user's emotional state, data generation means for generating advice for the user based on the information processing means, information output means for presenting the advice via a communication device, data acquisition means for acquiring voice data and supplying it to the information processing means, and information analysis means for evaluating the user's emotions in real time. This makes it possible to analyze the user's emotions in detail and provide appropriate feedback in real time.

[0331] An "information processing device" is a component that has the function of analyzing the emotional state of a user.

[0332] A "data generation means" is a component that has the function of generating advice for the user based on the analysis results obtained from the information processing means.

[0333] An "information output means" is a component that has the function of presenting advice generated via a communication device to the user.

[0334] A "data acquisition means" is a component that has the function of acquiring audio data and supplying it to an information processing means.

[0335] The "information analysis tool" is a component that has the function of analyzing the user's voice data in real time and performing sentiment evaluation.

[0336] This invention aims to realize a system that highly analyzes the emotional state of users and provides appropriate feedback tailored to each user. The system is configured as follows:

[0337] The server functions as the system's central information processing unit, receiving voice data transmitted from terminals. Using a speech recognition module, the voice data is converted into text data. The converted text data is then analyzed by an emotion engine to provide a detailed assessment of the user's emotional state. A specific algorithm is applied to this assessment, enabling real-time analysis of the user's emotions.

[0338] The analysis results from the emotion engine are then generated by a generative AI model as advice tailored to the user's emotions. This advice is based on historical data and general psychology, and is designed to provide specific and helpful guidance.

[0339] The terminal functions as an information interface between the server and the user. The microphone built into the terminal acquires voice data and transmits it to the server. It also provides information to the user by displaying advice sent from the server on the display or playing it back as audio through the speaker.

[0340] Users can utilize this system in their daily lives. For example, if a user feels anxious about speaking in front of a large group, they can say to the device, "I get very nervous when speaking in front of a large group. How can I relax?" The server analyzes this voice, generates advice to alleviate anxiety, and provides the user with specific suggestions such as, "Take a deep breath and relax. You have successfully given great presentations in the past." In this way, users can gain a sense of security and emotional stability.

[0341] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0342] Step 1:

[0343] The device continuously captures ambient sound using a microphone. When a user speaks, it converts the audio data into a digital format and sends it to the server. The input in this step is the user's voice, and the output is the digital audio data sent to the server. Specifically, as soon as audio input is detected, the data is sampled and transferred to the server as data packets.

[0344] Step 2:

[0345] The server receives audio data transmitted from the terminal. It uses a speech recognition module to convert this data into text. The input is the audio data obtained from the terminal, and the output is the converted text data. A speech recognition algorithm is applied to this data processing, analyzing the features of the audio waveform and converting them into text.

[0346] Step 3:

[0347] The server passes the converted text data to the emotion engine. The emotion engine analyzes the emotional state from the text data using a specific algorithm. The input is the text data obtained in step 2, and the output is the analysis result indicating the user's emotional state. Specifically, it utilizes an emotion dictionary and natural language processing techniques to evaluate emotions based on keywords in the text.

[0348] Step 4:

[0349] The server inputs the analysis results into a generating AI model to produce advice tailored to the user's emotional state. This step involves data calculations that take into account past data and general psychological knowledge. The input is the emotion analysis result, and the output is the generated advice message. Specifically, the AI model utilizes its trained knowledge to construct appropriate advice based on various scenarios.

[0350] Step 5:

[0351] The server sends the generated advice to the terminal. The terminal converts the received data into an appropriate format to return this advice to the user, displaying it on the screen or playing it through the speaker. Here, the input is the advice data from the server, and the output is the real-time advice presented to the user. Specifically, the received message is processed as text or audio and presented in a way that the user can immediately understand.

[0352] (Application Example 2)

[0353] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0354] Modern households require technological advancements to improve people's quality of life. In particular, there is a lack of systems that can adequately understand family members' emotional states and provide real-time support. As a result, people often do not receive sufficient emotional support in their daily lives, leading to a buildup of stress.

[0355] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0356] In this invention, the server includes an analysis means for analyzing the emotional state of the user, a generation means for generating advice for the user based on the analysis means, a presentation means for presenting the advice via various devices, a monitoring means for understanding the emotional state of people in the home environment, and a support means for providing life support based on the information obtained using the monitoring means. This enables people in the home to receive support appropriate to their emotional state, thereby improving their quality of life.

[0357] "Users" refers to individuals or households who use this system.

[0358] "Emotional state" refers to the psychological or emotional state that a user is experiencing at a particular point in time, and this is usually information that is analyzed from voice, facial expressions, etc.

[0359] "Analysis means" refers to a device that has the function of analyzing the emotional state of the user from the acquired voice and data.

[0360] A "generation means" is a device that has the function of generating optimal advice and support for the user based on emotional information obtained by an analysis means.

[0361] "Presentation means" refers to devices or methods for displaying or notifying users of generated advice in a form that is recognizable to them.

[0362] "Surveillance devices" are those that have the function of observing people's behavior and voices within the home and acquiring necessary data.

[0363] "Support measures" refer to functions that provide support to improve quality of life in real time using analyzed emotional data and monitored information.

[0364] In implementing this invention, the system mainly consists of a server and a robot terminal located in the home. The server functions as a central processing unit and is responsible for processing the voice data transmitted from the terminal. The terminal is equipped with a microphone and speaker to acquire the user's voice and transmit it to the server.

[0365] The server receives the audio data and converts it into text data using the Google Cloud Speech-to-Text API. The text data is then subjected to sentiment analysis using IBM Watson's Natural Language Understanding to analyze the user's emotional state in detail. Based on the analyzed sentiment data, a generation tool, run by a Python program, uses AI to create appropriate feedback and advice. This generated advice is then delivered to the user through the device's speaker.

[0366] For example, when a user returns home tired at the end of a busy day, the robot might suggest, "Were you able to rest enough today? Shall I prepare a warm drink for you to relax?" In this way, users can receive real-time feedback tailored to their emotional state.

[0367] Examples of prompts for an AI generation model include: "Based on words obtained from the user's voice, please come up with specific health advice to help them relax."

[0368] This allows for timely emotional feedback within the family, improving the quality of daily life.

[0369] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0370] Step 1:

[0371] The user speaks into the device. The device continuously acquires ambient sound using its built-in microphone and transmits this audio data to the server in real time. The input is the user's spoken voice, which is converted into audio data.

[0372] Step 2:

[0373] The server converts the received audio data into text data using the Google Cloud Speech-to-Text API. This step extracts words and phrases from the audio and converts them into text, making them available for use in the next step. The output is the audio converted to text.

[0374] Step 3:

[0375] The server feeds the converted text data to IBM Watson's Natural Language Understanding to perform sentiment analysis. This process extracts emotional components from the context and words of the input text, determining emotional states such as positive, negative, and neutral. The output includes emotional scores and states.

[0376] Step 4:

[0377] The server generates feedback and advice using a Python program based on the sentiment analysis results. The generative AI model selects appropriate responses using prompts and creates optimal advice from a list to provide to the user. The output is the text of the specific advice.

[0378] Step 5:

[0379] The server sends the generated advice back to the terminal. The terminal then conveys that advice to the user verbally through its speaker. The expected output is words or actions that represent natural feedback from the user.

[0380] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0381] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0382] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0383] [Third Embodiment]

[0384] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0385] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0386] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0387] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0388] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0389] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0390] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0391] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0392] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0393] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0394] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0395] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0396] As an embodiment for carrying out the present invention, a system that performs specific program processing is designed. Its specific details are described below.

[0397] The server plays a central role in analyzing the user's emotional state. First, it receives the user's voice data transmitted from the terminal and converts it into text data using speech recognition technology. Based on this text data, the server uses an emotion analysis model to analyze the user's emotional state. Based on this emotional state, it has a process to generate appropriate advice for the user. The generated advice and information are sent to the terminal to which the user is connected.

[0398] The terminal acts as an interface between the server and the user. It has a voice acquisition function to capture the sound around the user in real time and send it to the server. The advice returned from the server is presented to the user as either audio or text. The terminal also contributes to the improvement of the overall system by collecting feedback from the user about the acquired advice and sending it to the server.

[0399] Users utilize the device during their daily activities and receive advice and support from the server. For example, if a user is feeling down after making a mistake at work, they can speak into the device, and their voice is sent to the server for analysis, after which appropriate encouragement and advice are provided. Through this process, users can objectively re-examine their own emotions and are encouraged to take positive action.

[0400] Thus, this system aims to support the improvement of users' resilience and assist in stress management in daily life by coordinating the operation of its various components, such as analysis means, generation means, and presentation means.

[0401] The following describes the processing flow.

[0402] Step 1:

[0403] The device collects the user's speech through the microphone. The audio data is processed through a noise-canceling filter and temporarily stored as clear audio data.

[0404] Step 2:

[0405] The terminal prepares to send the stored audio data to the server. It packets the data according to the communication protocol and sends it to the server via a secure line.

[0406] Step 3:

[0407] The server processes the audio data received from the terminal through a speech recognition system and analyzes it as text data. The analyzed text data is then input into an emotion analysis model to determine the user's emotional state.

[0408] Step 4:

[0409] Based on the sentiment analysis results, the server generates personalized advice according to pre-configured rules and algorithms. The generated advice is customized considering the user's past behavioral history.

[0410] Step 5:

[0411] The server converts the generated advice into text or audio format and sends it to the terminal. The data is encrypted and processed to ensure user privacy.

[0412] Step 6:

[0413] The terminal receives advice from the server and presents it to the user. If the advice is presented verbally, it is read aloud through the speaker; if it is presented as text, it is displayed on the screen.

[0414] Step 7:

[0415] Users adjust their actions based on the advice provided. If necessary, they can send feedback via their device, and the results will be used for future support.

[0416] (Example 1)

[0417] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0418] In modern society, there is a need for systems that effectively provide advice to users regarding the emotional stress and challenges they face. However, current technology makes it difficult to accurately analyze a user's emotional state and provide personalized advice in real time. Furthermore, there are insufficient methods for continuously improving the system using user feedback. As a result, there is a need to develop a system that provides truly useful support for users.

[0419] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0420] In this invention, the server includes computational processing means for analyzing the user's emotional state, data generation means for generating advice for the user, and information presentation means for presenting the advice via various communication devices. This makes it possible to provide appropriate advice based on the user's emotional state in real time and to continuously improve the system's performance through user feedback.

[0421] "Users" refers to individuals or organizations that use the system to receive services or support.

[0422] "Emotional state" refers to the psychological or emotional state that a user experiences at a particular point in time.

[0423] "Computational processing means" refers to a computer system or software that handles the process of analyzing data.

[0424] "Data generation means" refers to technical means for generating new information or advice based on analyzed information.

[0425] "Information presentation means" refers to devices or interfaces used to display or transmit information generated by a system to users via sound.

[0426] "Feedback" refers to the evaluations and opinions that users give to a system or the advice provided, and is used for improvement.

[0427] "Adjustment methods" refer to methods and processes for optimizing the system's performance and offerings based on collected feedback.

[0428] "Voice input means" refers to devices and technologies for acquiring a user's voice as digital information and supplying it to a system.

[0429] "History information" refers to information that records data about a user's past actions and system usage.

[0430] "Information recording means" refers to technical devices or systems that store data about users and utilize it for subsequent processing and analysis.

[0431] One embodiment of this invention is a system designed to facilitate communication between a server, a terminal, and a user.

[0432] The server primarily functions as a computational processing and data generation means. The server receives audio data from terminals via the internet and converts it into text data using speech recognition software. A common speech recognition API is used for this process. The converted text data is analyzed using a natural language processing model. This includes using BERT or similar models for sentiment analysis. Based on the analysis results, a generative AI model functions as a data generation means, generating advice for the user. The prompt for this generation process is set to "Specifically analyze the sentiment of this text and generate appropriate advice."

[0433] The terminal functions as both a voice input and information presentation device. A typical smart device is used, which includes a microphone for voice acquisition and a monitor or speaker for displaying user advice. The terminal acquires the user's spoken voice in real time and transmits it to the server. The received advice is presented to the user in text or audio format. Speech synthesis software may be used in this process.

[0434] Users utilize the system's functions through their devices in everyday situations. For example, if a user says to their device, "I've been feeling stressed lately," the server converts the audio into text and generates and returns advice on stress reduction. An example of such advice might be, "I recommend taking some time to relax. How about spending some time on a hobby?"

[0435] The ultimate goal of this system is to provide personalized advice tailored to the user's emotional state and improve their quality of life.

[0436] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0437] Step 1:

[0438] The device acquires audio from the user's surroundings in real time. This is done using a built-in microphone, and an audio acquisition application captures the data as a digital signal. The input is the user's spoken natural language, and the output is a digitized audio file.

[0439] Step 2:

[0440] The terminal transmits the acquired audio data to the server via the internet. The audio data is divided into packets and efficiently transferred over the network. The input is a digital audio file, and the output is the audio data that reaches the server.

[0441] Step 3:

[0442] The server converts the received audio data into text data using speech recognition software. Specifically, the speech recognition engine analyzes the audio waveform to generate an appropriate text representation. The input is the audio data sent to the server, and the output is the converted text data.

[0443] Step 4:

[0444] The server inputs the converted text data into a natural language processing model to analyze the user's emotional state. In this process, it infers emotions from sentence structure and word choices, and generates the results of the emotional analysis as output. The input is text data, and the output is data with the emotional state analyzed.

[0445] Step 5:

[0446] The server uses a generative AI model to generate advice based on the results of sentiment analysis. The prompt is set to "Provide appropriate advice based on the sentiment state of this text," and the model generates the best advice. The input is the sentiment analysis data, and the output is the text of the generated advice.

[0447] Step 6:

[0448] The server sends the generated advice to the terminal. It sends encoded data to ensure stable communication over the network. The input is the text of the generated advice, while the output is the advice text sent to the terminal.

[0449] Step 7:

[0450] The terminal presents the received advice to the user by converting it into speech using speech synthesis software or by displaying it as text on the screen. Information is delivered to the user through sight and sound. The input is the advice text from the server, and the output is the advice information conveyed to the user.

[0451] Step 8:

[0452] The user inputs feedback on the presented advice into the terminal. This can be done using voice or text input. The terminal sends this feedback to the server, contributing to system improvement. The input is the user's feedback, and the output is the feedback data sent to the server.

[0453] (Application Example 1)

[0454] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0455] In modern society, the elderly and those requiring care often experience psychological burdens due to loneliness and emotional fluctuations. To alleviate these burdens and improve well-being in daily life, there is a need for technology that can grasp emotional changes in real time and provide appropriate responses. However, conventional systems have the challenge of being unable to accurately analyze the emotional state of users and provide appropriate responses based on that analysis.

[0456] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0457] In this invention, the server includes an analysis means for analyzing the user's emotional state, a generation means for generating advice for the user based on the analysis means, and a presentation means for presenting the advice through various devices and further recommending music and information in accordance with changes in emotion. This makes it possible to accurately analyze the user's emotions, provide personalized advice and recommendations in real time, reduce the user's psychological burden, and improve their well-being.

[0458] "Analysis means for analyzing the emotional state of users" refers to a device that includes technology for converting audio data into text data and analyzing the user's emotions based on that text data.

[0459] A "generation means" is a device that has the function of generating appropriate advice and information for the user based on the analysis results.

[0460] A "presentation means" is a device that has the function of transmitting generated advice and information to the user through various devices, and also has the ability to recommend music and information in response to changes in emotions.

[0461] "Voice acquisition means" refers to a device that includes technology for acquiring a user's voice data in real time and transmitting it to a server for analysis.

[0462] "Processes adapted for emotional support in caregiving situations" refers to analytical and response processes designed to reduce psychological burden, particularly for the elderly and individuals requiring care.

[0463] "Historical data" refers to a collection of data that includes records of a user's past behavior and emotions, and is a source of information that is considered when creating advice and information.

[0464] "Methods for collecting feedback and improving the system" refers to the process of collecting user reactions and evaluations and using them to improve and optimize the entire system.

[0465] This invention is a system aimed at reducing the psychological burden on the elderly and individuals requiring care. The system includes analysis means, generation means, presentation means, voice acquisition means, history data recording means, and feedback collection means. The invention is specifically implemented through the roles of server, terminal, and user.

[0466] The server first receives the user's voice data transmitted from the terminal via a voice acquisition device. The hardware used at this stage is a smartphone or tablet, and software such as the Google Speech-to-Text API is used to convert the voice data into text data.

[0467] Next, the converted text data is analyzed using an emotion analysis model to analyze the user's emotional state. Here, a Python natural language processing library (e.g., Transformers) is utilized. For example, if the user says, "Nobody came to visit today, and I'm lonely," the server will perform an appropriate emotion analysis based on this information.

[0468] The generation mechanism generates appropriate advice and information for the user based on the analysis results. The generated advice is then sent back to the terminal via the server. At this stage, music or information tailored to the user may also be recommended.

[0469] The device uses its screen and speakers to present advice and music to the user through various means. It also collects user feedback and sends this information to a server. This continuous feedback allows for improvements to the entire system.

[0470] For example, if a user says, "I'm a little tired," the generative AI model might offer advice such as, "I'll play your favorite music to help you relax."

[0471] Example prompt: "Analyze user feedback to understand their emotions and generate optimal advice."

[0472] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0473] Step 1:

[0474] The user provides voice input. The terminal collects the user's voice using a voice acquisition device. This voice data is then input into subsequent processing.

[0475] Step 2:

[0476] The device sends the collected audio data to the server. The server uses the Google Speech-to-Text API to convert the audio data into text data. Audio data is input, and the corresponding text is output.

[0477] Step 3:

[0478] The server inputs the converted text data into an emotion analysis model. The analysis tool then performs emotion analysis on this text data. The input is text data, and the output is information indicating the user's emotional state. A Python natural language processing library is used.

[0479] Step 4:

[0480] The server uses a generative AI model to generate appropriate advice based on the analyzed emotional state. A prompt sentence (e.g., "Analyze the user's emotions and generate optimal advice") is formed based on the analysis results, and this is input to output the advice text.

[0481] Step 5:

[0482] The generated advice text is sent from the server to the terminal. The terminal communicates this advice to the user via a presentation device, either through a screen display or audio output. The terminal's output is the generated advice message.

[0483] Step 6:

[0484] Users provide feedback on the advice they receive. This feedback is sent from the terminal to the server and used to improve the system. The server uses the user's feedback data as input to update the system's algorithms and analysis models.

[0485] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0486] In an embodiment for carrying out the present invention, a system incorporating an emotion engine has been designed to highly analyze the user's emotional state and provide support tailored to individual needs. The embodiments thereof will be described in detail below.

[0487] The server functions as a central data processing unit, receiving audio data transmitted from terminals. This audio data is converted into text data by a speech recognition module within the server. The text data is then fed into the emotion engine, which analyzes the user's emotional state in more detail. The emotion engine uses specific algorithms and has its own methods for evaluating the user's emotions in real time. This evaluation result is passed to a generation mechanism within the server, which constructs advice tailored to the individual's emotional state.

[0488] The terminal acts as the interface between the server and the user. A microphone installed in the terminal constantly captures ambient sound and transmits this audio data to the server. Advice returned from the server is presented to the user through the terminal's display or speaker. This allows the user to receive emotion-based feedback in real time.

[0489] Users can utilize this system in their daily lives and receive support in various situations. For example, when a user is preparing a presentation for work, they can speak into the device to express their anxiety. The server receives this audio and, using its emotion engine, analyzes that the user is experiencing high levels of anxiety. Based on this, the server generates specific advice such as, "Take a deep breath and relax. You have successfully delivered excellent presentations in the past," and provides it to the user through the device. Through this process, users can gain a sense of security and achieve emotional stability.

[0490] Thus, the system of the present invention aims to enhance the user's resilience and improve their ability to cope with daily life stress by linking analysis means, generation means, presentation means, and an emotion engine.

[0491] The following describes the processing flow.

[0492] Step 1:

[0493] The device collects the user's voice using a microphone and performs noise cancellation processing to eliminate ambient noise. This results in the extraction of clear audio data.

[0494] Step 2:

[0495] The terminal prepares to send the processed voice data to the server by packetizing the data and sending it through a secure communication channel.

[0496] Step 3:

[0497] The server converts the audio data received from the terminal into text data using a speech recognition module. The text data is then prepared for further sentiment analysis.

[0498] Step 4:

[0499] The server utilizes an emotion engine to analyze the user's emotional state in detail from text data. During this process, it assigns positive, negative, or neutral emotion tags.

[0500] Step 5:

[0501] The server generates the most suitable advice for the user based on emotion tags obtained by the emotion engine. The algorithm creates customized advice while also considering the user's history.

[0502] Step 6:

[0503] The server encodes the generated advice as a data packet, encrypts it, and then sends it to the terminal.

[0504] Step 7:

[0505] The terminal decodes the advice sent from the server and presents it to the user in audio or text format. The user can understand the information by playing the audio through the speaker or displaying the text on the screen.

[0506] Step 8:

[0507] Users adjust their actions and feelings based on the advice provided. This user feedback is then sent to the server via the device, contributing to improvements in the system's accuracy.

[0508] (Example 2)

[0509] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0510] In today's world, it is increasingly important to quickly and accurately understand users' emotional states and provide real-time feedback tailored to their individual needs. However, conventional systems often lack the accuracy to analyze users' emotions and the quality of the advice they provide. Therefore, there is a need for technology that can analyze emotions with higher precision and provide accurate, personalized advice.

[0511] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0512] In this invention, the server includes information processing means for analyzing the user's emotional state, data generation means for generating advice for the user based on the information processing means, information output means for presenting the advice via a communication device, data acquisition means for acquiring voice data and supplying it to the information processing means, and information analysis means for evaluating the user's emotions in real time. This makes it possible to analyze the user's emotions in detail and provide appropriate feedback in real time.

[0513] An "information processing device" is a component that has the function of analyzing the emotional state of a user.

[0514] A "data generation means" is a component that has the function of generating advice for the user based on the analysis results obtained from the information processing means.

[0515] An "information output means" is a component that has the function of presenting advice generated via a communication device to the user.

[0516] A "data acquisition means" is a component that has the function of acquiring audio data and supplying it to an information processing means.

[0517] The "information analysis tool" is a component that has the function of analyzing the user's voice data in real time and performing sentiment evaluation.

[0518] This invention aims to realize a system that highly analyzes the emotional state of users and provides appropriate feedback tailored to each user. The system is configured as follows:

[0519] The server functions as the system's central information processing unit, receiving voice data transmitted from terminals. Using a speech recognition module, the voice data is converted into text data. The converted text data is then analyzed by an emotion engine to provide a detailed assessment of the user's emotional state. A specific algorithm is applied to this assessment, enabling real-time analysis of the user's emotions.

[0520] The analysis results from the emotion engine are then generated by a generative AI model as advice tailored to the user's emotions. This advice is based on historical data and general psychology, and is designed to provide specific and helpful guidance.

[0521] The terminal functions as an information interface between the server and the user. The microphone built into the terminal acquires voice data and transmits it to the server. It also provides information to the user by displaying advice sent from the server on the display or playing it back as audio through the speaker.

[0522] Users can utilize this system in their daily lives. For example, if a user feels anxious about speaking in front of a large group, they can say to the device, "I get very nervous when speaking in front of a large group. How can I relax?" The server analyzes this voice, generates advice to alleviate anxiety, and provides the user with specific suggestions such as, "Take a deep breath and relax. You have successfully given great presentations in the past." In this way, users can gain a sense of security and emotional stability.

[0523] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0524] Step 1:

[0525] The device continuously captures ambient sound using a microphone. When a user speaks, it converts the audio data into a digital format and sends it to the server. The input in this step is the user's voice, and the output is the digital audio data sent to the server. Specifically, as soon as audio input is detected, the data is sampled and transferred to the server as data packets.

[0526] Step 2:

[0527] The server receives audio data transmitted from the terminal. It uses a speech recognition module to convert this data into text. The input is the audio data obtained from the terminal, and the output is the converted text data. A speech recognition algorithm is applied to this data processing, analyzing the features of the audio waveform and converting them into text.

[0528] Step 3:

[0529] The server passes the converted text data to the emotion engine. The emotion engine analyzes the emotional state from the text data using a specific algorithm. The input is the text data obtained in step 2, and the output is the analysis result indicating the user's emotional state. Specifically, it utilizes an emotion dictionary and natural language processing techniques to evaluate emotions based on keywords in the text.

[0530] Step 4:

[0531] The server inputs the analysis results into a generating AI model to produce advice tailored to the user's emotional state. This step involves data calculations that take into account past data and general psychological knowledge. The input is the emotion analysis result, and the output is the generated advice message. Specifically, the AI model utilizes its trained knowledge to construct appropriate advice based on various scenarios.

[0532] Step 5:

[0533] The server sends the generated advice to the terminal. The terminal converts the received data into an appropriate format to return this advice to the user, displaying it on the screen or playing it through the speaker. Here, the input is the advice data from the server, and the output is the real-time advice presented to the user. Specifically, the received message is processed as text or audio and presented in a way that the user can immediately understand.

[0534] (Application Example 2)

[0535] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0536] Modern households require technological advancements to improve people's quality of life. In particular, there is a lack of systems that can adequately understand family members' emotional states and provide real-time support. As a result, people often do not receive sufficient emotional support in their daily lives, leading to a buildup of stress.

[0537] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0538] In this invention, the server includes an analysis means for analyzing the emotional state of the user, a generation means for generating advice for the user based on the analysis means, a presentation means for presenting the advice via various devices, a monitoring means for understanding the emotional state of people in the home environment, and a support means for providing life support based on the information obtained using the monitoring means. This enables people in the home to receive support appropriate to their emotional state, thereby improving their quality of life.

[0539] "Users" refers to individuals or households who use this system.

[0540] "Emotional state" refers to the psychological or emotional state that a user is experiencing at a particular point in time, and this is usually information that is analyzed from voice, facial expressions, etc.

[0541] "Analysis means" refers to a device that has the function of analyzing the emotional state of the user from the acquired voice and data.

[0542] A "generation means" is a device that has the function of generating optimal advice and support for the user based on emotional information obtained by an analysis means.

[0543] "Presentation means" refers to devices or methods for displaying or notifying users of generated advice in a form that is recognizable to them.

[0544] "Surveillance devices" are those that have the function of observing people's behavior and voices within the home and acquiring necessary data.

[0545] "Support measures" refer to functions that provide support to improve quality of life in real time using analyzed emotional data and monitored information.

[0546] In implementing this invention, the system mainly consists of a server and a robot terminal located in the home. The server functions as a central processing unit and is responsible for processing the voice data transmitted from the terminal. The terminal is equipped with a microphone and speaker to acquire the user's voice and transmit it to the server.

[0547] The server receives the audio data and converts it into text data using the Google Cloud Speech-to-Text API. The text data is then subjected to sentiment analysis using IBM Watson's Natural Language Understanding to analyze the user's emotional state in detail. Based on the analyzed sentiment data, a generation tool, run by a Python program, uses AI to create appropriate feedback and advice. This generated advice is then delivered to the user through the device's speaker.

[0548] For example, when a user returns home tired at the end of a busy day, the robot might suggest, "Were you able to rest enough today? Shall I prepare a warm drink for you to relax?" In this way, users can receive real-time feedback tailored to their emotional state.

[0549] Examples of prompts for an AI generation model include: "Based on words obtained from the user's voice, please come up with specific health advice to help them relax."

[0550] This allows for timely emotional feedback within the family, improving the quality of daily life.

[0551] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0552] Step 1:

[0553] The user speaks into the device. The device continuously acquires ambient sound using its built-in microphone and transmits this audio data to the server in real time. The input is the user's spoken voice, which is converted into audio data.

[0554] Step 2:

[0555] The server converts the received audio data into text data using the Google Cloud Speech-to-Text API. This step extracts words and phrases from the audio and converts them into text, making them available for use in the next step. The output is the audio converted to text.

[0556] Step 3:

[0557] The server feeds the converted text data to IBM Watson's Natural Language Understanding to perform sentiment analysis. This process extracts emotional components from the context and words of the input text, determining emotional states such as positive, negative, and neutral. The output includes emotional scores and states.

[0558] Step 4:

[0559] The server generates feedback and advice using a Python program based on the sentiment analysis results. The generative AI model selects appropriate responses using prompts and creates optimal advice from a list to provide to the user. The output is the text of the specific advice.

[0560] Step 5:

[0561] The server sends the generated advice back to the terminal. The terminal then conveys that advice to the user verbally through its speaker. The expected output is words or actions that represent natural feedback from the user.

[0562] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0563] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0564] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0565] [Fourth Embodiment]

[0566] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0567] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0568] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0569] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0570] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0571] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0572] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0573] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0574] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0575] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0576] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0577] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0578] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0579] As an embodiment for carrying out the present invention, a system that performs specific program processing is designed. Its specific details are described below.

[0580] The server plays a central role in analyzing the user's emotional state. First, it receives the user's voice data transmitted from the terminal and converts it into text data using speech recognition technology. Based on this text data, the server uses an emotion analysis model to analyze the user's emotional state. Based on this emotional state, it has a process to generate appropriate advice for the user. The generated advice and information are sent to the terminal to which the user is connected.

[0581] The terminal acts as an interface between the server and the user. It has a voice acquisition function to capture the sound around the user in real time and send it to the server. The advice returned from the server is presented to the user as either audio or text. The terminal also contributes to the improvement of the overall system by collecting feedback from the user about the acquired advice and sending it to the server.

[0582] Users utilize the device during their daily activities and receive advice and support from the server. For example, if a user is feeling down after making a mistake at work, they can speak into the device, and their voice is sent to the server for analysis, after which appropriate encouragement and advice are provided. Through this process, users can objectively re-examine their own emotions and are encouraged to take positive action.

[0583] Thus, this system aims to support the improvement of users' resilience and assist in stress management in daily life by coordinating the operation of its various components, such as analysis means, generation means, and presentation means.

[0584] The following describes the processing flow.

[0585] Step 1:

[0586] The device collects the user's speech through the microphone. The audio data is processed through a noise-canceling filter and temporarily stored as clear audio data.

[0587] Step 2:

[0588] The terminal prepares to send the stored audio data to the server. It packets the data according to the communication protocol and sends it to the server via a secure line.

[0589] Step 3:

[0590] The server processes the audio data received from the terminal through a speech recognition system and analyzes it as text data. The analyzed text data is then input into an emotion analysis model to determine the user's emotional state.

[0591] Step 4:

[0592] Based on the sentiment analysis results, the server generates personalized advice according to pre-configured rules and algorithms. The generated advice is customized considering the user's past behavioral history.

[0593] Step 5:

[0594] The server converts the generated advice into text or audio format and sends it to the terminal. The data is encrypted and processed to ensure user privacy.

[0595] Step 6:

[0596] The terminal receives advice from the server and presents it to the user. If the advice is presented verbally, it is read aloud through the speaker; if it is presented as text, it is displayed on the screen.

[0597] Step 7:

[0598] Users adjust their actions based on the advice provided. If necessary, they can send feedback via their device, and the results will be used for future support.

[0599] (Example 1)

[0600] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0601] In modern society, there is a need for systems that effectively provide advice to users regarding the emotional stress and challenges they face. However, current technology makes it difficult to accurately analyze a user's emotional state and provide personalized advice in real time. Furthermore, there are insufficient methods for continuously improving the system using user feedback. As a result, there is a need to develop a system that provides truly useful support for users.

[0602] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0603] In this invention, the server includes computational processing means for analyzing the user's emotional state, data generation means for generating advice for the user, and information presentation means for presenting the advice via various communication devices. This makes it possible to provide appropriate advice based on the user's emotional state in real time and to continuously improve the system's performance through user feedback.

[0604] "Users" refers to individuals or organizations that use the system to receive services or support.

[0605] "Emotional state" refers to the psychological or emotional state that a user experiences at a particular point in time.

[0606] "Computational processing means" refers to a computer system or software that handles the process of analyzing data.

[0607] "Data generation means" refers to technical means for generating new information or advice based on analyzed information.

[0608] "Information presentation means" refers to devices or interfaces used to display or transmit information generated by a system to users via sound.

[0609] "Feedback" refers to the evaluations and opinions that users give to a system or the advice provided, and is used for improvement.

[0610] "Adjustment methods" refer to methods and processes for optimizing the system's performance and offerings based on collected feedback.

[0611] "Voice input means" refers to devices and technologies for acquiring a user's voice as digital information and supplying it to a system.

[0612] "History information" refers to information that records data about a user's past actions and system usage.

[0613] "Information recording means" refers to technical devices or systems that store data about users and utilize it for subsequent processing and analysis.

[0614] One embodiment of this invention is a system designed to facilitate communication between a server, a terminal, and a user.

[0615] The server primarily functions as a computational processing and data generation means. The server receives audio data from terminals via the internet and converts it into text data using speech recognition software. A common speech recognition API is used for this process. The converted text data is analyzed using a natural language processing model. This includes using BERT or similar models for sentiment analysis. Based on the analysis results, a generative AI model functions as a data generation means, generating advice for the user. The prompt for this generation process is set to "Specifically analyze the sentiment of this text and generate appropriate advice."

[0616] The terminal functions as both a voice input and information presentation device. A typical smart device is used, which includes a microphone for voice acquisition and a monitor or speaker for displaying user advice. The terminal acquires the user's spoken voice in real time and transmits it to the server. The received advice is presented to the user in text or audio format. Speech synthesis software may be used in this process.

[0617] Users utilize the system's functions through their devices in everyday situations. For example, if a user says to their device, "I've been feeling stressed lately," the server converts the audio into text and generates and returns advice on stress reduction. An example of such advice might be, "I recommend taking some time to relax. How about spending some time on a hobby?"

[0618] The ultimate goal of this system is to provide personalized advice tailored to the user's emotional state and improve their quality of life.

[0619] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0620] Step 1:

[0621] The device acquires audio from the user's surroundings in real time. This is done using a built-in microphone, and an audio acquisition application captures the data as a digital signal. The input is the user's spoken natural language, and the output is a digitized audio file.

[0622] Step 2:

[0623] The terminal transmits the acquired audio data to the server via the internet. The audio data is divided into packets and efficiently transferred over the network. The input is a digital audio file, and the output is the audio data that reaches the server.

[0624] Step 3:

[0625] The server converts the received audio data into text data using speech recognition software. Specifically, the speech recognition engine analyzes the audio waveform to generate an appropriate text representation. The input is the audio data sent to the server, and the output is the converted text data.

[0626] Step 4:

[0627] The server inputs the converted text data into a natural language processing model to analyze the user's emotional state. In this process, it infers emotions from sentence structure and word choices, and generates the results of the emotional analysis as output. The input is text data, and the output is data with the emotional state analyzed.

[0628] Step 5:

[0629] The server uses a generative AI model to generate advice based on the results of sentiment analysis. The prompt is set to "Provide appropriate advice based on the sentiment state of this text," and the model generates the best advice. The input is the sentiment analysis data, and the output is the text of the generated advice.

[0630] Step 6:

[0631] The server sends the generated advice to the terminal. It sends encoded data to ensure stable communication over the network. The input is the text of the generated advice, while the output is the advice text sent to the terminal.

[0632] Step 7:

[0633] The terminal presents the received advice to the user by converting it into speech using speech synthesis software or by displaying it as text on the screen. Information is delivered to the user through sight and sound. The input is the advice text from the server, and the output is the advice information conveyed to the user.

[0634] Step 8:

[0635] The user inputs feedback on the presented advice into the terminal. This can be done using voice or text input. The terminal sends this feedback to the server, contributing to system improvement. The input is the user's feedback, and the output is the feedback data sent to the server.

[0636] (Application Example 1)

[0637] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0638] In modern society, the elderly and those requiring care often experience psychological burdens due to loneliness and emotional fluctuations. To alleviate these burdens and improve well-being in daily life, there is a need for technology that can grasp emotional changes in real time and provide appropriate responses. However, conventional systems have the challenge of being unable to accurately analyze the emotional state of users and provide appropriate responses based on that analysis.

[0639] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0640] In this invention, the server includes an analysis means for analyzing the user's emotional state, a generation means for generating advice for the user based on the analysis means, and a presentation means for presenting the advice through various devices and further recommending music and information in accordance with changes in emotion. This makes it possible to accurately analyze the user's emotions, provide personalized advice and recommendations in real time, reduce the user's psychological burden, and improve their well-being.

[0641] "Analysis means for analyzing the emotional state of users" refers to a device that includes technology for converting audio data into text data and analyzing the user's emotions based on that text data.

[0642] A "generation means" is a device that has the function of generating appropriate advice and information for the user based on the analysis results.

[0643] A "presentation means" is a device that has the function of transmitting generated advice and information to the user through various devices, and also has the ability to recommend music and information in response to changes in emotions.

[0644] "Voice acquisition means" refers to a device that includes technology for acquiring a user's voice data in real time and transmitting it to a server for analysis.

[0645] "Processes adapted for emotional support in caregiving situations" refers to analytical and response processes designed to reduce psychological burden, particularly for the elderly and individuals requiring care.

[0646] "Historical data" refers to a collection of data that includes records of a user's past behavior and emotions, and is a source of information that is considered when creating advice and information.

[0647] "Methods for collecting feedback and improving the system" refers to the process of collecting user reactions and evaluations and using them to improve and optimize the entire system.

[0648] This invention is a system aimed at reducing the psychological burden on the elderly and individuals requiring care. The system includes analysis means, generation means, presentation means, voice acquisition means, history data recording means, and feedback collection means. The invention is specifically implemented through the roles of server, terminal, and user.

[0649] The server first receives the user's voice data transmitted from the terminal via a voice acquisition device. The hardware used at this stage is a smartphone or tablet, and software such as the Google Speech-to-Text API is used to convert the voice data into text data.

[0650] Next, the converted text data is analyzed using an emotion analysis model to analyze the user's emotional state. Here, a Python natural language processing library (e.g., Transformers) is utilized. For example, if the user says, "Nobody came to visit today, and I'm lonely," the server will perform an appropriate emotion analysis based on this information.

[0651] The generation mechanism generates appropriate advice and information for the user based on the analysis results. The generated advice is then sent back to the terminal via the server. At this stage, music or information tailored to the user may also be recommended.

[0652] The device uses its screen and speakers to present advice and music to the user through various means. It also collects user feedback and sends this information to a server. This continuous feedback allows for improvements to the entire system.

[0653] For example, if a user says, "I'm a little tired," the generative AI model might offer advice such as, "I'll play your favorite music to help you relax."

[0654] Example prompt: "Analyze user feedback to understand their emotions and generate optimal advice."

[0655] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0656] Step 1:

[0657] The user provides voice input. The terminal collects the user's voice using a voice acquisition device. This voice data is then input into subsequent processing.

[0658] Step 2:

[0659] The device sends the collected audio data to the server. The server uses the Google Speech-to-Text API to convert the audio data into text data. Audio data is input, and the corresponding text is output.

[0660] Step 3:

[0661] The server inputs the converted text data into an emotion analysis model. The analysis tool then performs emotion analysis on this text data. The input is text data, and the output is information indicating the user's emotional state. A Python natural language processing library is used.

[0662] Step 4:

[0663] The server uses a generative AI model to generate appropriate advice based on the analyzed emotional state. A prompt sentence (e.g., "Analyze the user's emotions and generate optimal advice") is formed based on the analysis results, and this is input to output the advice text.

[0664] Step 5:

[0665] The generated advice text is sent from the server to the terminal. The terminal communicates this advice to the user via a presentation device, either through a screen display or audio output. The terminal's output is the generated advice message.

[0666] Step 6:

[0667] Users provide feedback on the advice they receive. This feedback is sent from the terminal to the server and used to improve the system. The server uses the user's feedback data as input to update the system's algorithms and analysis models.

[0668] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0669] In an embodiment for carrying out the present invention, a system incorporating an emotion engine has been designed to highly analyze the user's emotional state and provide support tailored to individual needs. The embodiments thereof will be described in detail below.

[0670] The server functions as a central data processing unit, receiving audio data transmitted from terminals. This audio data is converted into text data by a speech recognition module within the server. The text data is then fed into the emotion engine, which analyzes the user's emotional state in more detail. The emotion engine uses specific algorithms and has its own methods for evaluating the user's emotions in real time. This evaluation result is passed to a generation mechanism within the server, which constructs advice tailored to the individual's emotional state.

[0671] The terminal acts as the interface between the server and the user. A microphone installed in the terminal constantly captures ambient sound and transmits this audio data to the server. Advice returned from the server is presented to the user through the terminal's display or speaker. This allows the user to receive emotion-based feedback in real time.

[0672] Users can utilize this system in their daily lives and receive support in various situations. For example, when a user is preparing a presentation for work, they can speak into the device to express their anxiety. The server receives this audio and, using its emotion engine, analyzes that the user is experiencing high levels of anxiety. Based on this, the server generates specific advice such as, "Take a deep breath and relax. You have successfully delivered excellent presentations in the past," and provides it to the user through the device. Through this process, users can gain a sense of security and achieve emotional stability.

[0673] Thus, the system of the present invention aims to enhance the user's resilience and improve their ability to cope with daily life stress by linking analysis means, generation means, presentation means, and an emotion engine.

[0674] The following describes the processing flow.

[0675] Step 1:

[0676] The device collects the user's voice using a microphone and performs noise cancellation processing to eliminate ambient noise. This results in the extraction of clear audio data.

[0677] Step 2:

[0678] The terminal prepares to send the processed voice data to the server by packetizing the data and sending it through a secure communication channel.

[0679] Step 3:

[0680] The server converts the audio data received from the terminal into text data using a speech recognition module. The text data is then prepared for further sentiment analysis.

[0681] Step 4:

[0682] The server utilizes an emotion engine to analyze the user's emotional state in detail from text data. During this process, it assigns positive, negative, or neutral emotion tags.

[0683] Step 5:

[0684] The server generates the most suitable advice for the user based on emotion tags obtained by the emotion engine. The algorithm creates customized advice while also considering the user's history.

[0685] Step 6:

[0686] The server encodes the generated advice as a data packet, encrypts it, and then sends it to the terminal.

[0687] Step 7:

[0688] The terminal decodes the advice sent from the server and presents it to the user in audio or text format. The user can understand the information by playing the audio through the speaker or displaying the text on the screen.

[0689] Step 8:

[0690] Users adjust their actions and feelings based on the advice provided. This user feedback is then sent to the server via the device, contributing to improvements in the system's accuracy.

[0691] (Example 2)

[0692] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0693] In today's world, it is increasingly important to quickly and accurately understand users' emotional states and provide real-time feedback tailored to their individual needs. However, conventional systems often lack the accuracy to analyze users' emotions and the quality of the advice they provide. Therefore, there is a need for technology that can analyze emotions with higher precision and provide accurate, personalized advice.

[0694] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0695] In this invention, the server includes information processing means for analyzing the user's emotional state, data generation means for generating advice for the user based on the information processing means, information output means for presenting the advice via a communication device, data acquisition means for acquiring voice data and supplying it to the information processing means, and information analysis means for evaluating the user's emotions in real time. This makes it possible to analyze the user's emotions in detail and provide appropriate feedback in real time.

[0696] An "information processing device" is a component that has the function of analyzing the emotional state of a user.

[0697] A "data generation means" is a component that has the function of generating advice for the user based on the analysis results obtained from the information processing means.

[0698] An "information output means" is a component that has the function of presenting advice generated via a communication device to the user.

[0699] A "data acquisition means" is a component that has the function of acquiring audio data and supplying it to an information processing means.

[0700] The "information analysis tool" is a component that has the function of analyzing the user's voice data in real time and performing sentiment evaluation.

[0701] This invention aims to realize a system that highly analyzes the emotional state of users and provides appropriate feedback tailored to each user. The system is configured as follows:

[0702] The server functions as the system's central information processing unit, receiving voice data transmitted from terminals. Using a speech recognition module, the voice data is converted into text data. The converted text data is then analyzed by an emotion engine to provide a detailed assessment of the user's emotional state. A specific algorithm is applied to this assessment, enabling real-time analysis of the user's emotions.

[0703] The analysis results from the emotion engine are then generated by a generative AI model as advice tailored to the user's emotions. This advice is based on historical data and general psychology, and is designed to provide specific and helpful guidance.

[0704] The terminal functions as an information interface between the server and the user. The microphone built into the terminal acquires voice data and transmits it to the server. It also provides information to the user by displaying advice sent from the server on the display or playing it back as audio through the speaker.

[0705] Users can utilize this system in their daily lives. For example, if a user feels anxious about speaking in front of a large group, they can say to the device, "I get very nervous when speaking in front of a large group. How can I relax?" The server analyzes this voice, generates advice to alleviate anxiety, and provides the user with specific suggestions such as, "Take a deep breath and relax. You have successfully given great presentations in the past." In this way, users can gain a sense of security and emotional stability.

[0706] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0707] Step 1:

[0708] The device continuously captures ambient sound using a microphone. When a user speaks, it converts the audio data into a digital format and sends it to the server. The input in this step is the user's voice, and the output is the digital audio data sent to the server. Specifically, as soon as audio input is detected, the data is sampled and transferred to the server as data packets.

[0709] Step 2:

[0710] The server receives audio data transmitted from the terminal. It uses a speech recognition module to convert this data into text. The input is the audio data obtained from the terminal, and the output is the converted text data. A speech recognition algorithm is applied to this data processing, analyzing the features of the audio waveform and converting them into text.

[0711] Step 3:

[0712] The server passes the converted text data to the emotion engine. The emotion engine analyzes the emotional state from the text data using a specific algorithm. The input is the text data obtained in step 2, and the output is the analysis result indicating the user's emotional state. Specifically, it utilizes an emotion dictionary and natural language processing techniques to evaluate emotions based on keywords in the text.

[0713] Step 4:

[0714] The server inputs the analysis results into a generating AI model to produce advice tailored to the user's emotional state. This step involves data calculations that take into account past data and general psychological knowledge. The input is the emotion analysis result, and the output is the generated advice message. Specifically, the AI model utilizes its trained knowledge to construct appropriate advice based on various scenarios.

[0715] Step 5:

[0716] The server sends the generated advice to the terminal. The terminal converts the received data into an appropriate format to return this advice to the user, displaying it on the screen or playing it through the speaker. Here, the input is the advice data from the server, and the output is the real-time advice presented to the user. Specifically, the received message is processed as text or audio and presented in a way that the user can immediately understand.

[0717] (Application Example 2)

[0718] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0719] Modern households require technological advancements to improve people's quality of life. In particular, there is a lack of systems that can adequately understand family members' emotional states and provide real-time support. As a result, people often do not receive sufficient emotional support in their daily lives, leading to a buildup of stress.

[0720] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0721] In this invention, the server includes an analysis means for analyzing the emotional state of the user, a generation means for generating advice for the user based on the analysis means, a presentation means for presenting the advice via various devices, a monitoring means for understanding the emotional state of people in the home environment, and a support means for providing life support based on the information obtained using the monitoring means. This enables people in the home to receive support appropriate to their emotional state, thereby improving their quality of life.

[0722] "Users" refers to individuals or households who use this system.

[0723] "Emotional state" refers to the psychological or emotional state that a user is experiencing at a particular point in time, and this is usually information that is analyzed from voice, facial expressions, etc.

[0724] "Analysis means" refers to a device that has the function of analyzing the emotional state of the user from the acquired voice and data.

[0725] A "generation means" is a device that has the function of generating optimal advice and support for the user based on emotional information obtained by an analysis means.

[0726] "Presentation means" refers to devices or methods for displaying or notifying users of generated advice in a form that is recognizable to them.

[0727] "Surveillance devices" are those that have the function of observing people's behavior and voices within the home and acquiring necessary data.

[0728] "Support measures" refer to functions that provide support to improve quality of life in real time using analyzed emotional data and monitored information.

[0729] In implementing this invention, the system mainly consists of a server and a robot terminal located in the home. The server functions as a central processing unit and is responsible for processing the voice data transmitted from the terminal. The terminal is equipped with a microphone and speaker to acquire the user's voice and transmit it to the server.

[0730] The server receives the audio data and converts it into text data using the Google Cloud Speech-to-Text API. The text data is then subjected to sentiment analysis using IBM Watson's Natural Language Understanding to analyze the user's emotional state in detail. Based on the analyzed sentiment data, a generation tool, run by a Python program, uses AI to create appropriate feedback and advice. This generated advice is then delivered to the user through the device's speaker.

[0731] For example, when a user returns home tired at the end of a busy day, the robot might suggest, "Were you able to rest enough today? Shall I prepare a warm drink for you to relax?" In this way, users can receive real-time feedback tailored to their emotional state.

[0732] Examples of prompts for an AI generation model include: "Based on words obtained from the user's voice, please come up with specific health advice to help them relax."

[0733] This allows for timely emotional feedback within the family, improving the quality of daily life.

[0734] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0735] Step 1:

[0736] The user speaks into the device. The device continuously acquires ambient sound using its built-in microphone and transmits this audio data to the server in real time. The input is the user's spoken voice, which is converted into audio data.

[0737] Step 2:

[0738] The server converts the received audio data into text data using the Google Cloud Speech-to-Text API. This step extracts words and phrases from the audio and converts them into text, making them available for use in the next step. The output is the audio converted to text.

[0739] Step 3:

[0740] The server feeds the converted text data to IBM Watson's Natural Language Understanding to perform sentiment analysis. This process extracts emotional components from the context and words of the input text, determining emotional states such as positive, negative, and neutral. The output includes emotional scores and states.

[0741] Step 4:

[0742] The server generates feedback and advice using a Python program based on the sentiment analysis results. The generative AI model selects appropriate responses using prompts and creates optimal advice from a list to provide to the user. The output is the text of the specific advice.

[0743] Step 5:

[0744] The server sends the generated advice back to the terminal. The terminal then conveys that advice to the user verbally through its speaker. The expected output is words or actions that represent natural feedback from the user.

[0745] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0746] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0747] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0748] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0749] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0750] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0751] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0752] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0753] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0754] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0755] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0756] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0757] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0758] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0759] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0760] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0761] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0762] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0763] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0764] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0765] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted as being incorporated by reference.

[0766] The following is further disclosed regarding the embodiments described above.

[0767] (Claim 1)

[0768] An analytical method for analyzing the emotional state of users,

[0769] A generation means that generates advice to the user based on the analysis means,

[0770] Presentation means for presenting the aforementioned advice through various devices,

[0771] A system that includes this.

[0772] (Claim 2)

[0773] The system according to claim 1, further comprising a voice acquisition means for acquiring user voice information and supplying it to the analysis means.

[0774] (Claim 3)

[0775] The system according to claim 1, further comprising a recording means for recording user history information and for creating advice in the generation means taking the history information into consideration.

[0776] "Example 1"

[0777] (Claim 1)

[0778] A computational processing means for analyzing the emotional state of the user,

[0779] A data generation means that generates advice to the user based on the calculation processing means,

[0780] Information presentation means that presents the aforementioned advice via various communication devices,

[0781] An input processing means for collecting feedback from users,

[0782] An adjustment means that supplies the aforementioned feedback to the data generation means and improves its performance,

[0783] A system that includes this.

[0784] (Claim 2)

[0785] The system according to claim 1, further comprising a voice input means for acquiring user voice information and supplying it to the arithmetic processing means.

[0786] (Claim 3)

[0787] The system according to claim 1, further comprising information recording means for recording user history information and creating advice in the data generation means taking the history information into consideration.

[0788] "Application Example 1"

[0789] (Claim 1)

[0790] An analytical method for analyzing the emotional state of users,

[0791] A generation means that generates advice to the user based on the analysis means,

[0792] A presentation means that presents the aforementioned advice through various devices and further recommends music and information in response to emotional changes,

[0793] A system that includes this.

[0794] (Claim 2)

[0795] The system according to claim 1, comprising: a voice acquisition means for acquiring user voice data and supplying it to the analysis means; and a process adapted for the purpose of providing emotional support in a caregiving setting.

[0796] (Claim 3)

[0797] The system according to claim 1, further comprising: recording means for recording user history data and creating advice and information recommendations in the generation means taking the history data into consideration; and means for continuously collecting feedback to improve the system.

[0798] "Example 2 of combining an emotion engine"

[0799] (Claim 1)

[0800] Information processing means for analyzing the emotional state of users,

[0801] A data generation means that generates advice to the user based on the information processing means,

[0802] Information output means that presents the aforementioned advice via a communication device,

[0803] A data acquisition means that acquires audio data and supplies it to the information processing means,

[0804] An information analysis method for evaluating user emotions in real time,

[0805] A system that includes this.

[0806] (Claim 2)

[0807] The system according to claim 1, further comprising a data recording means for recording past user data and for creating advice in the data generation means taking the recorded data into consideration.

[0808] (Claim 3)

[0809] The system according to claim 1, further comprising information conversion means for receiving voice data from a user and converting it into text data.

[0810] "Application example 2 when combining with an emotional engine"

[0811] (Claim 1)

[0812] An analytical method for analyzing the emotional state of users,

[0813] A generation means that generates advice to the user based on the analysis means,

[0814] Presentation means for presenting the aforementioned advice through various devices,

[0815] A monitoring tool for understanding people's emotional states within the home environment,

[0816] A support means that provides life support based on information obtained using the aforementioned monitoring means,

[0817] A system that includes this.

[0818] (Claim 2)

[0819] The system according to claim 1, further comprising a voice acquisition means for acquiring user voice information and supplying it to the analysis means.

[0820] (Claim 3)

[0821] The system according to claim 1, further comprising a recording means for recording user history information and for creating advice in the generation means taking the history information into consideration. [Explanation of symbols]

[0822] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. An analytical method for analyzing the emotional state of users, A generation means that generates advice to the user based on the analysis means, Presentation means for presenting the aforementioned advice through various devices, A system that includes this.

2. The system according to claim 1, further comprising a voice acquisition means for acquiring user voice information and supplying it to the analysis means.

3. The system according to claim 1, further comprising a recording means for recording user history information and creating advice in the generation means taking the history information into consideration.