system

A system utilizing real-time voice and biometric data analysis provides personalized emotional support by accurately monitoring and responding to users' emotional states, addressing the challenge of stress management and mental health in modern society.

JP2026101262APending Publication Date: 2026-06-22SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-10
Publication Date
2026-06-22

AI Technical Summary

Technical Problem

Modern society faces challenges in effectively managing stress and maintaining mental health due to increased psychological issues and insufficient coping mechanisms, with existing systems failing to accurately utilize voice and biometric data for personalized emotional support.

Method used

A system that acquires user voice and biometric data in real-time to estimate emotional states, generates personalized advice, and improves accuracy through user feedback integration using machine learning algorithms.

Benefits of technology

Enables timely and personalized emotional support by accurately monitoring and responding to users' emotional states, reducing stress and enhancing mental well-being.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026101262000001_ABST
    Figure 2026101262000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] Means for acquiring user voice and biometric data, A means of analyzing acquired data in real time and estimating emotional states, A means of generating appropriate advice for the user based on their emotional state, In caregiving settings, a means of understanding emotional states and proposing individualized care and relaxation methods, A means for notifying the user of the generated advice on their information processing device, A means of collecting user feedback and using it to improve information processing technology, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern society, psychological problems caused by increased stress and insufficient coping with it are becoming serious. In particular, a sense of self - affirmation decreases due to work failures or criticism, and it often becomes difficult to maintain resilience. However, there is a problem that many individuals cannot successfully switch their emotions or reduce stress in a short period of time due to daily busyness and a sense of isolation, making it difficult to maintain mental health.

Means for Solving the Problems

[0005] To address this challenge, we provide a system that acquires user voice and biometric data and estimates their emotional state in real time based on this data. This system creates emotionally-based advice and notifies the user's device, supporting effective mood shifts. Furthermore, by collecting user feedback and incorporating it into the system's learning algorithm, we aim to personalize and improve the accuracy of the advice.

[0006] A "user" refers to an individual who operates or uses the system to receive emotional support.

[0007] "Audio data" refers to auditory information, including the words and tone of voice spoken by the user.

[0008] "Biometric data" refers to physiological information obtained directly from the user's body, such as heart rate and facial expressions.

[0009] "Real-time" refers to a temporal process in which information is collected and the results are immediately analyzed and responded to.

[0010] "Emotional state" refers to the psychological or emotional state that a user is experiencing at a particular time.

[0011] "Advice" refers to suggestions or instructions generated by the system to improve the user's emotional state.

[0012] "Notification" refers to the communication process for conveying generated advice to the user.

[0013] "Feedback" refers to the reactions or opinions that users give to advice they receive.

[0014] A "machine learning algorithm" refers to a computational model used to analyze data and continuously improve a system.

[0015] "Personalization" refers to the customization process for providing services and content optimized for individual users.

Brief Explanation of Drawings

[0016] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which multiple emotions are mapped. [Figure 10] It shows an emotion map to which multiple emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.

Embodiments for Carrying Out the Invention

[0017] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0018] First, the terms used in the following description will be explained.

[0019] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0020] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0021] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0022] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0023] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0024] [First Embodiment]

[0025] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0026] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0027] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0028] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0029] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0030] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0031] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0032] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0033] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0034] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0035] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0036] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0037] This invention is a system that utilizes the user's voice and biometric data to monitor their emotional state and analyze it in real time, thereby providing the user with appropriate advice. An embodiment of this system is shown below.

[0038] Data collection and analysis

[0039] The device uses microphones and sensors to acquire voice and biometric data when the user uses the device. The acquired data is immediately transmitted to a server using a secure protocol. The server converts the received voice data into text using natural language processing algorithms and analyzes what the user is saying and the tone of their voice. At the same time, it analyzes heart rate and changes in facial expressions based on biometric data to estimate the user's emotional state.

[0040] Estimation of emotional state and generation of advice

[0041] The server identifies the user's emotional state based on the analysis results. For example, if the analysis determines that the user is likely experiencing stress, the server uses this information and generative AI technology to generate specific advice for relaxation. This advice is customized to the individual user's preferences and past feedback.

[0042] Advice notification and feedback collection

[0043] The generated advice is communicated to the user via their device. The notification method is tailored to the user's device type and is designed to non-intrusively support their daily activities, such as through pop-up screens or voice guidance. The user's response is sent back to the server as feedback from the device and used to improve future advice generation. This enhances the overall accuracy and performance of the system.

[0044] These features allow users to receive support tailored to their individual circumstances, enabling them to effectively reduce stress and manage their emotions.

[0045] The following describes the processing flow.

[0046] Step 1:

[0047] The device collects voice data using a microphone while the user is using the device, and acquires biometric data such as heart rate and facial expressions using sensors. This data is temporarily stored within the device.

[0048] Step 2:

[0049] The device transmits acquired voice and biometric data to the server in real time. Secure communication protocols such as HTTPS are used for data transfer to protect user privacy.

[0050] Step 3:

[0051] The server converts the received audio data into text using natural language processing (NLP) algorithms and analyzes the user's vocabulary and tone of voice. This generates foundational data for evaluating the user's psychological state.

[0052] Step 4:

[0053] The server simultaneously analyzes biometric data, estimating stress levels and emotional changes from the user's heart rate and facial expressions. The analysis results are integrated with other emotional indicators to identify the user's overall emotional state.

[0054] Step 5:

[0055] The server generates advice tailored to the user's state based on an analysis of their emotional state. Using generation AI technology, the advice is personalized, taking into account the user's past feedback and preferences.

[0056] Step 6:

[0057] The server sends the generated advice to the terminal.

[0058] Step 7:

[0059] The device notifies the user of advice received from the server. These notifications appear as pop-up messages or voice guidance on the device, taking care not to interrupt the user's daily activities.

[0060] Step 8:

[0061] The device records the user's response to advice and subsequent actions as feedback. This information is then sent back to the server and used to generate future advice.

[0062] This processing flow allows the system to monitor the user's emotional state and provide appropriate support.

[0063] (Example 1)

[0064] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0065] In modern society, understanding the stress and other emotional states experienced by individual users in real time and providing timely, appropriate advice is a major challenge. However, conventional technologies have not been able to fully utilize voice and biometric data, making it difficult to provide personalized support to users. Furthermore, existing systems have low accuracy in information processing, resulting in insufficient advice that accurately reflects the user's state.

[0066] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0067] In this invention, the server includes means for acquiring the user's voice and vital data, means for analyzing the acquired data in real time and estimating the emotional state, and means for generating appropriate advice for the user based on the emotional state. This makes it possible to accurately grasp the user's emotional state and quickly provide personalized advice according to that state.

[0068] "Audio data" refers to a digital recording of a user's voice.

[0069] "Vital data" refers to biometric information such as the user's heart rate and skin electrical activity.

[0070] "Real-time" refers to a process where the time between data acquisition and analysis is extremely short, instantly reflecting the user's current state.

[0071] "Emotional state" refers to information that indicates the psychological state or mood that the user is experiencing.

[0072] "Advice" refers to specific suggestions and instructions provided to users to reduce stress and achieve emotional stability.

[0073] A "speech recognition algorithm" is a technology that analyzes speech data and converts it into text.

[0074] "Natural language processing" refers to technologies that enable machines to understand, analyze, and generate human speech and text.

[0075] "Transcription" refers to the process of converting audio information into written text.

[0076] "Feedback" refers to the reactions or opinions that users give to advice they receive.

[0077] "Machine learning techniques" are algorithms that allow computers to learn from large amounts of data and derive more accurate results.

[0078] This invention is a system that provides personalized advice to users by monitoring their emotional state using their voice and biometric data and analyzing it in real time.

[0079] Specifically, the device is equipped with a microphone to capture voice input and sensors to detect heart rate, skin electrical activity, and other parameters. The device continues to collect this data while the user performs their normal daily activities. The voice data is not processed on-site but is immediately sent to the server using a secure protocol (e.g., TLS / SSL).

[0080] The server first converts the audio data into text using a speech recognition algorithm (e.g., a commercial speech recognition API). This process includes noise reduction and volume adjustment, and the converted text is then analyzed using natural language processing (NLP) techniques. This makes it possible to understand the content of the user's speech and the nuances of their emotions.

[0081] Simultaneously, the server analyzes biometric data such as heart rate and facial expression changes to estimate the user's current emotional state. For example, an increase in heart rate or changes in facial expression can indicate that the user is experiencing stress.

[0082] Based on the analyzed data, the server understands the user's emotional state and uses a generative AI model (e.g., a general natural language processing model) to generate advice appropriate to the user's situation. This advice is generated based on the generated prompt text. An example of a prompt text would be: "The user's voice tone is calm, and their heart rate is within the normal range. The user is in a meeting. Please suggest techniques to help them stay relaxed."

[0083] The generated advice is notified to the user via their device. On smartphones, the advice is displayed as a pop-up notification. On smart speakers, it is provided via voice. Users can follow this advice to manage stress and maintain focus.

[0084] User feedback is sent back to the server via the device and used to improve the entire system based on the accumulated data. Machine learning techniques that take feedback into account enable continuous improvement in the quality of advice.

[0085] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0086] Step 1:

[0087] The device acquires the user's voice and vital data. In this process, the device's microphone is used to record the user's conversation and vocalizations in real time, while sensors are used to simultaneously record vital data such as heart rate and skin electrical activity. At this point, the inputs are voice and vital data, and the output is a digital dataset of these.

[0088] Step 2:

[0089] The terminal transmits collected voice and vital data to the server using a secure protocol. The data is encrypted before transmission and processed to prevent information leakage. The input is the voice and vital data acquired by the terminal, and the output is the dataset received by the server.

[0090] Step 3:

[0091] The server converts the received audio data into text using a speech recognition algorithm. The input is audio data, and the output is text data. This process includes audio preprocessing to remove audio noise and ensure accurate text conversion.

[0092] Step 4:

[0093] The server analyzes the transcribed speech using natural language processing techniques to understand the user's language patterns and emotional nuances. The input consists of text and intonation information generated by speech recognition, while the output is an analysis of the user's emotional state. The analysis process also includes an evaluation of emotional tone using a language model.

[0094] Step 5:

[0095] The server analyzes biometric data and supplements it with data on the user's emotional state. Inputs are vital data such as heart rate and skin electrical activity, and outputs are emotional assessments such as psychological stress and relaxation levels. The analysis uses pattern recognition algorithms to detect outliers and abnormal patterns.

[0096] Step 6:

[0097] The server uses a generative AI model to generate advice based on the analysis results. The input is the analyzed emotional state, and the prompt includes the instruction, "The user's heart rate is elevated, suggesting they may be stressed. Please generate advice to help them feel more at ease." The output is specific advice provided to the user.

[0098] Step 7:

[0099] The terminal notifies the user of the generated advice. The input is the advice generated by the server, and the output is the notification information delivered to the user (e.g., voice guidance or pop-up message). Specifically, an appropriate display method is used depending on the type of device.

[0100] Step 8:

[0101] Users act based on the advice provided and provide feedback as needed. This feedback is sent from the terminal to the server and used for future system improvements. The input is user feedback information, and the output is improvement data used to generate future advice.

[0102] (Application Example 1)

[0103] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0104] In care settings for the elderly and those requiring care, appropriate care and support tailored to individual needs are required. However, currently, the burden on care staff is significant, making it difficult to respond immediately to the emotional state of all users. This project aims to solve these problems, improve the quality of life for users, and reduce the burden on care staff.

[0105] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0106] In this invention, the server includes means for acquiring the user's voice and biometric data, means for analyzing the acquired data in real time and estimating the emotional state, means for generating appropriate advice for the user based on the emotional state, means for understanding the emotional state in a care setting and proposing individualized care and relaxation methods, means for notifying the user of the generated advice on the user's information processing device, and means for collecting user feedback and using it to improve information processing technology. This makes it possible to accurately monitor the emotional state of users in care settings and provide individually appropriate care.

[0107] "Audio data" refers to sound information collected using a receiving device, which is the content of what the user is saying.

[0108] "Biometric data" refers to information that indicates the user's physical condition, including physiological indicators such as heart rate and facial expressions.

[0109] "Emotional state" refers to a state that indicates the user's emotional condition, and includes psychological tendencies such as stress and relaxation.

[0110] "Advice" refers to specific suggestions or instructions provided based on the user's emotional state, with the aim of solving problems or improving their condition.

[0111] An "information processing device" is a device that processes digital data and provides information and instructions to users, and includes electronic devices such as smartphones and tablets.

[0112] A "server" is a central device that processes information and data on a network, and is a computer that is responsible for analyzing and processing data sent by users.

[0113] "Feedback" refers to a user's response to advice and suggestions, and is information used to improve the system.

[0114] "Information processing technology" refers to technologies used to efficiently analyze and process large amounts of data, and includes natural language processing and machine learning.

[0115] The system that realizes this invention consists of an information processing device and a server. The information processing device used by the user is equipped with sensors for acquiring voice and biometric data. This device immediately transmits the voice and biometric data collected using the sensors to the server.

[0116] The server converts the collected audio data into text using natural language processing algorithms. Specifically, it uses Amazon Transcribe (AWS®) to convert audio into text information, and then analyzes the content and tone of the audio using Amazon Comprehend. Simultaneously, it estimates the emotional state from biometric data via Amazon SageMaker. This allows for real-time analysis of the user's emotional state.

[0117] Based on the analysis results, the server uses AI technology to generate advice tailored to the user's emotional state. This generated advice is then customized to individual needs by Amazon Personalize and communicated to the information processing device via Amazon SNS. This notification is designed to be easily understood and delivered in a non-invasive manner.

[0118] As a concrete example, in a caregiving setting, if signs of anxiety are detected in the user, a voice command such as "Shall we try taking a deep breath?" is provided through the information processing device. Furthermore, an example of a prompt sentence to be input to the generating AI model would be, "Please suggest actions to take when the user's emotional state is identified as anxiety."

[0119] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0120] Step 1:

[0121] The device uses a microphone and biosensors to collect the user's voice and biometric data. This collected data serves as input. The voice data is digitized, and biometric data such as heart rate and facial expression parameters are obtained. This data is transmitted to the server in real time.

[0122] Step 2:

[0123] The server converts the received audio data into text using a natural language processing algorithm. Specifically, the server uses Amazon Transcribe from AWS to convert the audio data into text information. As a result, the content of the audio is output as string data.

[0124] Step 3:

[0125] The server analyzes the content and tone of the transcribed audio data. Amazon Comprehend is used to extract the user's emotions and intentions from the text data. This analysis generates patterns that indicate emotional tendencies.

[0126] Step 4:

[0127] Simultaneously, the server estimates the emotional state based on biometric data. Using Amazon SageMaker, it analyzes heart rate and facial expression data to evaluate the user's stress and relaxation levels. This process outputs an index indicating the emotional state.

[0128] Step 5:

[0129] The server utilizes an AI model based on the analysis results to generate optimal advice for the user. It suggests relaxation methods and care tailored to the user's emotional state, and the generated advice is output.

[0130] Step 6:

[0131] The generated advice is customized to the individual user's needs using Amazon Personalize. Advice is adaptively optimized by considering user history and feedback.

[0132] Step 7:

[0133] Advice is sent to the device via Amazon SNS, allowing the user to review the recommended actions. The final output is provided as text or audio instructions on the device.

[0134] Step 8:

[0135] The user responds to the advice they receive, and this feedback is sent back from the device to the server. The server uses this feedback to improve the accuracy of future advice generation.

[0136] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0137] This invention provides a system for precisely identifying a user's emotional state and providing appropriate support. By incorporating an emotion engine, it accurately recognizes emotions from the user's voice and biometric data, enabling a personalized approach.

[0138] The role of collecting emotional data and the emotional engine

[0139] The device utilizes microphones and various sensors to collect user voice and biometric data with high accuracy. The collected data is transmitted to a server in real time. The emotion engine on the server analyzes this data and uses a pre-trained emotion model to identify the user's emotional state. This model is trained to understand various emotion categories (e.g., joy, sadness, anger, surprise, etc.).

[0140] Enhanced emotion recognition and advice generation

[0141] The server uses generative AI technology to create personalized advice for the user based on the emotional state obtained by the emotion engine. For example, if the user is determined to be stressed, a specific action plan for relaxation will be presented. This plan is customized by taking into account feedback on the user's past activities and responses.

[0142] Feedback loops and system learning

[0143] The device continuously records the user's responses to advice. This feedback data is sent to a server and used in machine learning algorithms to help update the emotion engine model. Through this feedback loop, the system improves the accuracy of its emotion recognition, enabling the delivery of more personalized advice.

[0144] These configurations create an efficient and intelligent support system designed to assist users' psychological well-being and reduce daily stress.

[0145] The following describes the processing flow.

[0146] Step 1:

[0147] The device uses a microphone to acquire voice data while the user operates it, and sensors to collect biometric data such as heart rate and skin temperature. This data is temporarily stored in real time.

[0148] Step 2:

[0149] The device transfers the collected voice and biometric data to the server via a secure communication protocol (e.g., HTTPS). This process is encrypted to ensure user privacy.

[0150] Step 3:

[0151] The server converts the received audio data into text using natural language processing (NLP) technology and meticulously analyzes the content of the speech and the tone of voice. This analysis extracts patterns that reflect the user's emotions.

[0152] Step 4:

[0153] The server utilizes an emotion engine to identify the user's emotional state based on analyzed voice and biometric data. The engine uses a pre-trained emotion model to determine which category the emotion belongs to (e.g., joy, sadness, anger).

[0154] Step 5:

[0155] The server uses generative AI to generate appropriate advice for the user based on the analysis results from the emotion engine. This advice is personalized by taking into account past user data and feedback information.

[0156] Step 6:

[0157] The server then sends the generated advice to the terminal.

[0158] Step 7:

[0159] The device notifies the user of any advice it receives. These notifications appear as on-screen pop-up messages or voice messages, carefully designed to avoid interrupting the user's current activities.

[0160] Step 8:

[0161] The device records how the user responds to the advice. This response data is sent to the server as feedback.

[0162] Step 9:

[0163] The server analyzes the collected feedback and applies it to machine learning algorithms to update the emotion engine model. This continuous learning process improves the system's emotion recognition accuracy and the personalization of its advice.

[0164] (Example 2)

[0165] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0166] While there is a need to accurately recognize users' emotional states and provide appropriate advice, conventional systems have insufficient accuracy in recognizing emotional states, resulting in low-quality personalized support. Furthermore, they lack effective means of utilizing user feedback to improve system accuracy. Therefore, there is a need for an effective system that supports users' psychological well-being and reduces daily stress.

[0167] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0168] In this invention, the server includes means for acquiring the user's voice and biometric data, means for analyzing the acquired data in real time and estimating the emotional state, and means for generating appropriate advice for the user based on the emotional state. This enables high-precision recognition of the user's emotional state and the provision of personalized advice tailored to individual circumstances.

[0169] A "user" is an individual who uses the system to gain recognition of their own emotional state and receive advice.

[0170] "Voice and biometric data" refers to voice recordings necessary to recognize the user's emotional state, as well as data showing physiological indicators such as heart rate, body temperature, and skin conductivity.

[0171] "Real-time analysis" refers to a process where data is processed immediately after acquisition, resulting in instant results.

[0172] "Emotional state" refers to the psychological and emotional state a user experiences at a given moment, and includes categories such as joy, sadness, anger, and surprise.

[0173] "Appropriate advice" refers to suggestions for specific actions to improve or support the user, tailored to their current emotional state.

[0174] "Feedback" refers to data that records users' reactions and results to advice provided by the system, and is used to improve the system's accuracy.

[0175] A "machine learning algorithm" is a computational method that automatically learns patterns and rules from data to improve the performance of a system.

[0176] A "generative AI model" is an artificial intelligence technology that generates information and actions for a specific purpose from a large amount of data.

[0177] A "prompt" is a text input given to a generative AI model to generate a specific output.

[0178] This system is designed to accurately identify the user's emotions and provide personalized advice. An embodiment of this system is shown below.

[0179] The device collects the user's voice and biometric data using a microphone and various sensors. Specifically, it records voice and monitors heart rate, body temperature, and skin electrical activity. This data is collected in real time and immediately transmitted to the server.

[0180] The server runs analysis software called an emotion engine. This emotion engine analyzes data sent by the user using an emotion model that has been previously trained using machine learning algorithms. This allows for the analysis of the linguistic and phonological features of the data, making it possible to estimate the user's emotional state with high accuracy. Emotional states are expressed in categories such as joy, sadness, anger, and surprise.

[0181] Based on these analysis results, the server uses a generative AI model to generate personalized advice for the user. The generative AI model generates advice based on prompts to suggest the most suitable action plan for the user's emotional state. For example, by inputting the prompt "Suggest ways to relax when the user is feeling stressed," the server can generate specific advice such as "Take three deep breaths" or "Take a short walk."

[0182] The device notifies the user of advice transmitted from the server through a voice assistant or screen display. By following the advice provided, the user can reduce daily stress.

[0183] Furthermore, the device records the user's response to advice and sends that feedback to the server. Based on this feedback, the server continuously updates the emotion engine model. As a result, the system evolves with each use, providing more accurate and user-friendly support.

[0184] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0185] Step 1:

[0186] The user provides data through the device's voice input device and biosensors. The device records voice data and collects biometric data (heart rate, body temperature, skin conductivity, etc.) based on this data. This input data is transmitted to the server in real time.

[0187] Step 2:

[0188] The server receives voice and biometric data transmitted from the terminal. This data is passed to the emotion engine as input, where its linguistic and phonological features are analyzed. This analysis generates an output that estimates the user's emotional state (e.g., joy, sadness).

[0189] Step 3:

[0190] The server inputs the estimated emotional state into a generating AI model and uses prompts to generate user-specific advice. Specifically, if the emotional state is "stressed," it will output advice such as "Take three deep breaths" based on the prompt "Please suggest ways to relax."

[0191] Step 4:

[0192] The device receives advice sent from the server. The device notifies the user of this advice using a voice assistant or display. The user takes action according to the provided advice.

[0193] Step 5:

[0194] The user acts based on the advice and then provides feedback to the device. The device collects this feedback and sends it back to the server.

[0195] Step 6:

[0196] The server uses feedback data sent from the terminal to update the emotion engine model through machine learning. This update enables the model to produce more accurate emotion recognition and personalized advice in subsequent uses.

[0197] (Application Example 2)

[0198] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0199] Maintaining the mental health of the elderly is a major challenge in modern society. In particular, feelings of isolation and increased stress are factors that reduce the quality of life for the elderly. The present invention aims to provide an efficient support system to alleviate the anxiety and stress that the elderly experience in their daily lives and to provide them with psychological peace of mind.

[0200] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0201] In this invention, the server includes a device for acquiring the user's voice and biometric data, a device for analyzing the acquired data in real time and estimating the emotional state, and a device for generating personalized support based on the emotional state. This makes it possible to identify the user's emotional state in detail and propose appropriate relaxation or social interaction to support the psychological health of the elderly.

[0202] "Users" refer to anyone who uses the system directly or indirectly. This usually includes elderly people.

[0203] "Voice and biometric data" refers to voice information obtained from the user, as well as physiological indicators such as heart rate and skin electrical activity.

[0204] "Device" refers to hardware and software components for acquiring, analyzing, and notifying voice and biometric data.

[0205] "Analyzing in real time and estimating emotional state" refers to the process of immediately processing collected voice and biometric data to evaluate the user's emotional state.

[0206] "Generating personalized support" means creating advice and activity suggestions tailored to the individual needs and history of the user, based on their emotional state.

[0207] "Psychological health" refers to a state in which the user's mental state is stable and they are free from stress and anxiety.

[0208] "Offering relaxation or social interaction" means presenting users with options for activities that calm the mind or promote communication with others.

[0209] The system used to implement this application analyzes the emotional state of elderly individuals using voice and biometric data and provides appropriate support. The server utilizes a smartphone's microphone to collect voice data and a heart rate sensor to acquire biometric data. Voice data is processed using the "LibROSA" library, and the "Emotion-recognition-using-speech" model is used for emotion identification. Biometric data is organized and analyzed using "Pandas" and "SciKit-learn". Based on the identified emotional state, the server creates personalized support using generative AI technology. This also takes into account the user's past history. This support is notified to the user's smartphone, and specific behavioral guidance is provided.

[0210] The server collects user feedback and uses it to improve the system. This enables the use of machine learning algorithms and accelerates the updating of the model, allowing for even more personalized assistance.

[0211] For example, if the system analyzes that a user is experiencing anxiety, the server will notify them with support such as, "Please do a 30-minute deep breathing exercise. We will also play your favorite music from a list." An example of a prompt message would be, "Please suggest activities and exercises to alleviate the anxiety the user is feeling. Based on their past activity history, please also suggest music the user might like." In this way, this system is a concrete and effective implementation for supporting the psychological health of the elderly.

[0212] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0213] Step 1:

[0214] The device uses the smartphone's microphone to collect voice data and records the user's voice in real time. It also acquires biometric data (e.g., heart rate) using a heart rate sensor. In this step, the user's voice and biometric data are obtained as input, which are then sent to a subsequent analysis process.

[0215] Step 2:

[0216] The server processes the collected audio data using "LibROSA" to extract audio features. Biometric data is formatted using "Pandas". The input is the audio and biometric data collected in step 1, and the output is feature data in an analyzable format.

[0217] Step 3:

[0218] The server inputs speech feature data into the "Emotion-recognition-using-speech" model to estimate the user's emotional state. The input is the feature data extracted in step 2, and the output is categorical information of the emotional state (e.g., joy, anxiety, etc.).

[0219] Step 4:

[0220] The server uses a generative AI model to create personalized support based on the estimated emotional state. Prompts are used to formulate specific advice for the generative AI to alleviate the user's emotional state. The input is the emotional state result from step 3 and historical information, and the output is the support content based on that emotion.

[0221] Step 5:

[0222] The server notifies the user's device of the generated support content. The input is the support content created in step 4, and the output is a suggestion of specific actions to be taken on the user's smartphone.

[0223] Step 6:

[0224] Users respond to the support provided. Feedback based on this is collected and sent back to the server. This feedback information is used for the continuous improvement of the system. The input is user feedback, and the output is the accumulation of feedback data.

[0225] This process allows the system to gain a detailed understanding of the user's emotional state and provide more appropriate and personalized support.

[0226] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0227] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0228] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0229] [Second Embodiment]

[0230] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0231] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0232] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0233] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0234] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0235] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0236] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0237] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0238] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0239] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0240] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0241] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0242] This invention is a system that utilizes the user's voice and biometric data to monitor their emotional state and analyze it in real time, thereby providing the user with appropriate advice. An embodiment of this system is shown below.

[0243] Data collection and analysis

[0244] The device uses microphones and sensors to acquire voice and biometric data when the user uses the device. The acquired data is immediately transmitted to a server using a secure protocol. The server converts the received voice data into text using natural language processing algorithms and analyzes what the user is saying and the tone of their voice. At the same time, it analyzes heart rate and changes in facial expressions based on biometric data to estimate the user's emotional state.

[0245] Estimation of emotional state and generation of advice

[0246] The server identifies the user's emotional state based on the analysis results. For example, if the analysis determines that the user is likely experiencing stress, the server uses this information and generative AI technology to generate specific advice for relaxation. This advice is customized to the individual user's preferences and past feedback.

[0247] Advice notification and feedback collection

[0248] The generated advice is communicated to the user via their device. The notification method is tailored to the user's device type and is designed to non-intrusively support their daily activities, such as through pop-up screens or voice guidance. The user's response is sent back to the server as feedback from the device and used to improve future advice generation. This enhances the overall accuracy and performance of the system.

[0249] These features allow users to receive support tailored to their individual circumstances, enabling them to effectively reduce stress and manage their emotions.

[0250] The following describes the processing flow.

[0251] Step 1:

[0252] The device collects voice data using a microphone while the user is using the device, and acquires biometric data such as heart rate and facial expressions using sensors. This data is temporarily stored within the device.

[0253] Step 2:

[0254] The device transmits acquired voice and biometric data to the server in real time. Secure communication protocols such as HTTPS are used for data transfer to protect user privacy.

[0255] Step 3:

[0256] The server converts the received audio data into text using natural language processing (NLP) algorithms and analyzes the user's vocabulary and tone of voice. This generates foundational data for evaluating the user's psychological state.

[0257] Step 4:

[0258] The server simultaneously analyzes biometric data, estimating stress levels and emotional changes from the user's heart rate and facial expressions. The analysis results are integrated with other emotional indicators to identify the user's overall emotional state.

[0259] Step 5:

[0260] The server generates advice tailored to the user's state based on an analysis of their emotional state. Using generation AI technology, the advice is personalized, taking into account the user's past feedback and preferences.

[0261] Step 6:

[0262] The server sends the generated advice to the terminal.

[0263] Step 7:

[0264] The device notifies the user of advice received from the server. These notifications appear as pop-up messages or voice guidance on the device, taking care not to interrupt the user's daily activities.

[0265] Step 8:

[0266] The device records the user's response to advice and subsequent actions as feedback. This information is then sent back to the server and used to generate future advice.

[0267] This processing flow allows the system to monitor the user's emotional state and provide appropriate support.

[0268] (Example 1)

[0269] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0270] In modern society, understanding the stress and other emotional states experienced by individual users in real time and providing timely, appropriate advice is a major challenge. However, conventional technologies have not been able to fully utilize voice and biometric data, making it difficult to provide personalized support to users. Furthermore, existing systems have low accuracy in information processing, resulting in insufficient advice that accurately reflects the user's state.

[0271] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0272] In this invention, the server includes means for acquiring the user's voice and vital data, means for analyzing the acquired data in real time and estimating the emotional state, and means for generating appropriate advice for the user based on the emotional state. This makes it possible to accurately grasp the user's emotional state and quickly provide personalized advice according to that state.

[0273] "Audio data" refers to a digital recording of a user's voice.

[0274] "Vital data" refers to biometric information such as the user's heart rate and skin electrical activity.

[0275] "Real-time" refers to a process where the time between data acquisition and analysis is extremely short, instantly reflecting the user's current state.

[0276] "Emotional state" refers to information that indicates the psychological state or mood that the user is experiencing.

[0277] "Advice" refers to specific suggestions and instructions provided to users to reduce stress and achieve emotional stability.

[0278] A "speech recognition algorithm" is a technology that analyzes speech data and converts it into text.

[0279] "Natural language processing" refers to technologies that enable machines to understand, analyze, and generate human speech and text.

[0280] "Transcription" refers to the process of converting audio information into written text.

[0281] "Feedback" refers to the reactions or opinions that users give to advice they receive.

[0282] "Machine learning techniques" are algorithms that allow computers to learn from large amounts of data and derive more accurate results.

[0283] This invention is a system that monitors the emotional state using the user's voice and biometric data, performs real-time analysis, and provides personalized advice to the user.

[0284] Specifically, the terminal is equipped with a microphone for capturing voice input and sensors for sensing heart rate, skin electrical activity, etc. While the user is performing normal daily activities, the terminal continuously acquires these data. Instead of being processed on-site, the voice data is immediately transmitted to the server using a secure protocol (e.g., TLS / SSL).

[0285] First, the server converts the voice data into text using a voice recognition algorithm (e.g., a commercial voice recognition API). In this process, noise removal and volume adjustment are performed, and the text-converted data is analyzed by natural language processing (NLP) technology. This makes it possible to grasp the content and emotional nuances of the user's speech.

[0286] At the same time, the server analyzes the heart rate and changes in facial expressions from the biometric data and estimates the user's current emotional state. For example, it can be sensed that the user is in a stressed state by an increase in heart rate or changes in facial expressions.

[0287] The server that has grasped the emotional state from the analyzed data uses a generative AI model (e.g., a general natural language processing model) to generate advice suitable for the user's situation. This advice is generated based on the generated prompt text. An example of the prompt text is "The tone of the user's voice is calm, and the heart rate is within the normal range. The user is in a meeting. Please propose techniques to maintain relaxation."

[0288] The generated advice is notified to the user via the terminal. In the case of a smartphone, the advice is notified in the form of a pop-up. In the case of a smart speaker, it is provided by voice. The user can follow this advice to manage stress and maintain concentration.

[0289] User feedback is sent back to the server via the device and used to improve the entire system based on the accumulated data. Machine learning techniques that take feedback into account enable continuous improvement in the quality of advice.

[0290] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0291] Step 1:

[0292] The device acquires the user's voice and vital data. In this process, the device's microphone is used to record the user's conversation and vocalizations in real time, while sensors are used to simultaneously record vital data such as heart rate and skin electrical activity. At this point, the inputs are voice and vital data, and the output is a digital dataset of these.

[0293] Step 2:

[0294] The terminal transmits collected voice and vital data to the server using a secure protocol. The data is encrypted before transmission and processed to prevent information leakage. The input is the voice and vital data acquired by the terminal, and the output is the dataset received by the server.

[0295] Step 3:

[0296] The server converts the received audio data into text using a speech recognition algorithm. The input is audio data, and the output is text data. This process includes audio preprocessing to remove audio noise and ensure accurate text conversion.

[0297] Step 4:

[0298] The server analyzes the transcribed speech using natural language processing techniques to understand the user's language patterns and emotional nuances. The input consists of text and intonation information generated by speech recognition, while the output is an analysis of the user's emotional state. The analysis process also includes an evaluation of emotional tone using a language model.

[0299] Step 5:

[0300] The server analyzes biometric data and supplements it with data on the user's emotional state. Inputs are vital data such as heart rate and skin electrical activity, and outputs are emotional assessments such as psychological stress and relaxation levels. The analysis uses pattern recognition algorithms to detect outliers and abnormal patterns.

[0301] Step 6:

[0302] The server uses a generative AI model to generate advice based on the analysis results. The input is the analyzed emotional state, and the prompt includes the instruction, "The user's heart rate is elevated, suggesting they may be stressed. Please generate advice to help them feel more at ease." The output is specific advice provided to the user.

[0303] Step 7:

[0304] The terminal notifies the user of the generated advice. The input is the advice generated by the server, and the output is the notification information delivered to the user (e.g., voice guidance or pop-up message). Specifically, an appropriate display method is used depending on the type of device.

[0305] Step 8:

[0306] Users act based on the advice provided and provide feedback as needed. This feedback is sent from the terminal to the server and used for future system improvements. The input is user feedback information, and the output is improvement data used to generate future advice.

[0307] (Application Example 1)

[0308] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as a "server", and the smart glasses 214 are referred to as a "terminal".

[0309] In the field of care for the elderly and those requiring care, appropriate care and support according to individual needs are required. However, at present, the burden on care staff is large, and it is difficult to respond immediately to the emotional states of all users. The purpose is to solve such problems, improve the quality of life of users, and reduce the burden on care staff.

[0310] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0311] In this invention, the server includes means for acquiring the user's voice and biometric data, means for analyzing the acquired data in real time and estimating the emotional state, means for generating appropriate advice for the user based on the emotional state, means for grasping the emotional state at the care site and proposing individual care and relaxation methods, means for notifying the generated advice on the user's information processing device, and means for collecting the user's feedback and using it for improving information processing technology. As a result, it becomes possible to accurately monitor the emotional state of the user at the care site and provide appropriate care individually.

[0312] "Voice data" is the sound information collected using a receiving device for the content spoken by the user.

[0313] "Biometric data" is the information indicating the state of the user's body, and is data including physiological indicators such as heart rate and expression.

[0314] "Emotional state" is the state indicating the emotional situation of the user, and refers to psychological tendencies such as stress and relaxation.

[0315] "Advice" refers to specific suggestions or instructions provided based on the user's emotional state, with the aim of solving problems or improving their condition.

[0316] An "information processing device" is a device that processes digital data and provides information and instructions to users, and includes electronic devices such as smartphones and tablets.

[0317] A "server" is a central device that processes information and data on a network, and is a computer that is responsible for analyzing and processing data sent by users.

[0318] "Feedback" refers to a user's response to advice and suggestions, and is information used to improve the system.

[0319] "Information processing technology" refers to technologies used to efficiently analyze and process large amounts of data, and includes natural language processing and machine learning.

[0320] The system that realizes this invention consists of an information processing device and a server. The information processing device used by the user is equipped with sensors for acquiring voice and biometric data. This device immediately transmits the voice and biometric data collected using the sensors to the server.

[0321] The server converts the collected audio data into text using natural language processing algorithms. Specifically, it uses Amazon Transcribe from AWS to convert audio into text information, and then analyzes the content and tone of the audio using Amazon Comprehend. Simultaneously, biometric data is used to estimate emotional states via Amazon SageMaker. This allows for real-time analysis of the user's emotional state.

[0322] Based on the analysis results, the server uses AI technology to generate advice tailored to the user's emotional state. This generated advice is then customized to individual needs by Amazon Personalize and communicated to the information processing device via Amazon SNS. This notification is designed to be easily understood and delivered in a non-invasive manner.

[0323] As a concrete example, in a caregiving setting, if signs of anxiety are detected in the user, a voice command such as "Shall we try taking a deep breath?" is provided through the information processing device. Furthermore, an example of a prompt sentence to be input to the generating AI model would be, "Please suggest actions to take when the user's emotional state is identified as anxiety."

[0324] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0325] Step 1:

[0326] The device uses a microphone and biosensors to collect the user's voice and biometric data. This collected data serves as input. The voice data is digitized, and biometric data such as heart rate and facial expression parameters are obtained. This data is transmitted to the server in real time.

[0327] Step 2:

[0328] The server converts the received audio data into text using a natural language processing algorithm. Specifically, the server uses Amazon Transcribe from AWS to convert the audio data into text information. As a result, the content of the audio is output as string data.

[0329] Step 3:

[0330] The server analyzes the content and tone of the transcribed audio data. Amazon Comprehend is used to extract the user's emotions and intentions from the text data. This analysis generates patterns that indicate emotional tendencies.

[0331] Step 4:

[0332] Simultaneously, the server estimates the emotional state based on biometric data. Using Amazon SageMaker, it analyzes heart rate and facial expression data to evaluate the user's stress and relaxation levels. This process outputs an index indicating the emotional state.

[0333] Step 5:

[0334] The server utilizes an AI model based on the analysis results to generate optimal advice for the user. It suggests relaxation methods and care tailored to the user's emotional state, and the generated advice is output.

[0335] Step 6:

[0336] The generated advice is customized to the individual user's needs using Amazon Personalize. Advice is adaptively optimized by considering user history and feedback.

[0337] Step 7:

[0338] Advice is sent to the device via Amazon SNS, allowing the user to review the recommended actions. The final output is provided as text or audio instructions on the device.

[0339] Step 8:

[0340] The user responds to the advice they receive, and this feedback is sent back from the device to the server. The server uses this feedback to improve the accuracy of future advice generation.

[0341] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0342] This invention provides a system for precisely identifying a user's emotional state and providing appropriate support. By incorporating an emotion engine, it accurately recognizes emotions from the user's voice and biometric data, enabling a personalized approach.

[0343] The role of collecting emotional data and the emotional engine

[0344] The device utilizes microphones and various sensors to collect user voice and biometric data with high accuracy. The collected data is transmitted to a server in real time. The emotion engine on the server analyzes this data and uses a pre-trained emotion model to identify the user's emotional state. This model is trained to understand various emotion categories (e.g., joy, sadness, anger, surprise, etc.).

[0345] Enhanced emotion recognition and advice generation

[0346] The server uses generative AI technology to create personalized advice for the user based on the emotional state obtained by the emotion engine. For example, if the user is determined to be stressed, a specific action plan for relaxation will be presented. This plan is customized by taking into account feedback on the user's past activities and responses.

[0347] Feedback loops and system learning

[0348] The device continuously records the user's responses to advice. This feedback data is sent to a server and used in machine learning algorithms to help update the emotion engine model. Through this feedback loop, the system improves the accuracy of its emotion recognition, enabling the delivery of more personalized advice.

[0349] These configurations create an efficient and intelligent support system designed to assist users' psychological well-being and reduce daily stress.

[0350] The following describes the processing flow.

[0351] Step 1:

[0352] The device uses a microphone to acquire voice data while the user operates it, and sensors to collect biometric data such as heart rate and skin temperature. This data is temporarily stored in real time.

[0353] Step 2:

[0354] The device transfers the collected voice and biometric data to the server via a secure communication protocol (e.g., HTTPS). This process is encrypted to ensure user privacy.

[0355] Step 3:

[0356] The server converts the received audio data into text using natural language processing (NLP) technology and meticulously analyzes the content of the speech and the tone of voice. This analysis extracts patterns that reflect the user's emotions.

[0357] Step 4:

[0358] The server utilizes an emotion engine to identify the user's emotional state based on analyzed voice and biometric data. The engine uses a pre-trained emotion model to determine which category the emotion belongs to (e.g., joy, sadness, anger).

[0359] Step 5:

[0360] The server uses generative AI to generate appropriate advice for the user based on the analysis results from the emotion engine. This advice is personalized by taking into account past user data and feedback information.

[0361] Step 6:

[0362] The server then sends the generated advice to the terminal.

[0363] Step 7:

[0364] The device notifies the user of any advice it receives. These notifications appear as on-screen pop-up messages or voice messages, carefully designed to avoid interrupting the user's current activities.

[0365] Step 8:

[0366] The device records how the user responds to the advice. This response data is sent to the server as feedback.

[0367] Step 9:

[0368] The server analyzes the collected feedback and applies it to machine learning algorithms to update the emotion engine model. This continuous learning process improves the system's emotion recognition accuracy and the personalization of its advice.

[0369] (Example 2)

[0370] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0371] While there is a need to accurately recognize users' emotional states and provide appropriate advice, conventional systems have insufficient accuracy in recognizing emotional states, resulting in low-quality personalized support. Furthermore, they lack effective means of utilizing user feedback to improve system accuracy. Therefore, there is a need for an effective system that supports users' psychological well-being and reduces daily stress.

[0372] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0373] In this invention, the server includes means for acquiring the user's voice and biometric data, means for analyzing the acquired data in real time and estimating the emotional state, and means for generating appropriate advice for the user based on the emotional state. This enables high-precision recognition of the user's emotional state and the provision of personalized advice tailored to individual circumstances.

[0374] A "user" is an individual who uses the system to gain recognition of their own emotional state and receive advice.

[0375] "Voice and biometric data" refers to voice recordings necessary to recognize the user's emotional state, as well as data showing physiological indicators such as heart rate, body temperature, and skin conductivity.

[0376] "Real-time analysis" refers to a process where data is processed immediately after acquisition, resulting in instant results.

[0377] "Emotional state" refers to the psychological and emotional state a user experiences at a given moment, and includes categories such as joy, sadness, anger, and surprise.

[0378] "Appropriate advice" refers to suggestions for specific actions to improve or support the user, tailored to their current emotional state.

[0379] "Feedback" refers to data that records users' reactions and results to advice provided by the system, and is used to improve the system's accuracy.

[0380] A "machine learning algorithm" is a computational method that automatically learns patterns and rules from data to improve the performance of a system.

[0381] A "generative AI model" is an artificial intelligence technology that generates information and actions for a specific purpose from a large amount of data.

[0382] A "prompt" is a text input given to a generative AI model to generate a specific output.

[0383] This system is designed to accurately identify the user's emotions and provide personalized advice. An embodiment of this system is shown below.

[0384] The device collects the user's voice and biometric data using a microphone and various sensors. Specifically, it records voice and monitors heart rate, body temperature, and skin electrical activity. This data is collected in real time and immediately transmitted to the server.

[0385] The server runs analysis software called an emotion engine. This emotion engine analyzes data sent by the user using an emotion model that has been previously trained using machine learning algorithms. This allows for the analysis of the linguistic and phonological features of the data, making it possible to estimate the user's emotional state with high accuracy. Emotional states are expressed in categories such as joy, sadness, anger, and surprise.

[0386] Based on these analysis results, the server uses a generative AI model to generate personalized advice for the user. The generative AI model generates advice based on prompts to suggest the most suitable action plan for the user's emotional state. For example, by inputting the prompt "Suggest ways to relax when the user is feeling stressed," the server can generate specific advice such as "Take three deep breaths" or "Take a short walk."

[0387] The device notifies the user of advice transmitted from the server through a voice assistant or screen display. By following the advice provided, the user can reduce daily stress.

[0388] Furthermore, the device records the user's response to advice and sends that feedback to the server. Based on this feedback, the server continuously updates the emotion engine model. As a result, the system evolves with each use, providing more accurate and user-friendly support.

[0389] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0390] Step 1:

[0391] The user provides data through the device's voice input device and biosensors. The device records voice data and collects biometric data (heart rate, body temperature, skin conductivity, etc.) based on this data. This input data is transmitted to the server in real time.

[0392] Step 2:

[0393] The server receives voice and biometric data transmitted from the terminal. This data is passed to the emotion engine as input, where its linguistic and phonological features are analyzed. This analysis generates an output that estimates the user's emotional state (e.g., joy, sadness).

[0394] Step 3:

[0395] The server inputs the estimated emotional state into a generating AI model and uses prompts to generate user-specific advice. Specifically, if the emotional state is "stressed," it will output advice such as "Take three deep breaths" based on the prompt "Please suggest ways to relax."

[0396] Step 4:

[0397] The device receives advice sent from the server. The device notifies the user of this advice using a voice assistant or display. The user takes action according to the provided advice.

[0398] Step 5:

[0399] The user acts based on the advice and then provides feedback to the device. The device collects this feedback and sends it back to the server.

[0400] Step 6:

[0401] The server uses feedback data sent from the terminal to update the emotion engine model through machine learning. This update enables the model to produce more accurate emotion recognition and personalized advice in subsequent uses.

[0402] (Application Example 2)

[0403] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0404] Maintaining the mental health of the elderly is a major challenge in modern society. In particular, feelings of isolation and increased stress are factors that reduce the quality of life for the elderly. The present invention aims to provide an efficient support system to alleviate the anxiety and stress that the elderly experience in their daily lives and to provide them with psychological peace of mind.

[0405] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0406] In this invention, the server includes a device for acquiring the user's voice and biometric data, a device for analyzing the acquired data in real time and estimating the emotional state, and a device for generating personalized support based on the emotional state. This makes it possible to identify the user's emotional state in detail and propose appropriate relaxation or social interaction to support the psychological health of the elderly.

[0407] "Users" refer to anyone who uses the system directly or indirectly. This usually includes elderly people.

[0408] "Voice and biometric data" refers to voice information obtained from the user, as well as physiological indicators such as heart rate and skin electrical activity.

[0409] "Device" refers to hardware and software components for acquiring, analyzing, and notifying voice and biometric data.

[0410] "Analyzing in real time and estimating emotional state" refers to the process of immediately processing collected voice and biometric data to evaluate the user's emotional state.

[0411] "Generating personalized support" means creating advice and activity suggestions tailored to the individual needs and history of the user, based on their emotional state.

[0412] "Psychological health" refers to a state in which the user's mental state is stable and they are free from stress and anxiety.

[0413] "Offering relaxation or social interaction" means presenting users with options for activities that calm the mind or promote communication with others.

[0414] The system used to implement this application analyzes the emotional state of elderly individuals using voice and biometric data and provides appropriate support. The server utilizes a smartphone's microphone to collect voice data and a heart rate sensor to acquire biometric data. Voice data is processed using the "LibROSA" library, and the "Emotion-recognition-using-speech" model is used for emotion identification. Biometric data is organized and analyzed using "Pandas" and "SciKit-learn". Based on the identified emotional state, the server creates personalized support using generative AI technology. This also takes into account the user's past history. This support is notified to the user's smartphone, and specific behavioral guidance is provided.

[0415] The server collects user feedback and uses it to improve the system. This enables the use of machine learning algorithms and accelerates the updating of the model, allowing for even more personalized assistance.

[0416] For example, if the system analyzes that a user is experiencing anxiety, the server will notify them with support such as, "Please do a 30-minute deep breathing exercise. We will also play your favorite music from a list." An example of a prompt message would be, "Please suggest activities and exercises to alleviate the anxiety the user is feeling. Based on their past activity history, please also suggest music the user might like." In this way, this system is a concrete and effective implementation for supporting the psychological health of the elderly.

[0417] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0418] Step 1:

[0419] The device uses the smartphone's microphone to collect voice data and records the user's voice in real time. It also acquires biometric data (e.g., heart rate) using a heart rate sensor. In this step, the user's voice and biometric data are obtained as input, which are then sent to a subsequent analysis process.

[0420] Step 2:

[0421] The server processes the collected audio data using "LibROSA" to extract audio features. Biometric data is formatted using "Pandas". The input is the audio and biometric data collected in step 1, and the output is feature data in an analyzable format.

[0422] Step 3:

[0423] The server inputs speech feature data into the "Emotion-recognition-using-speech" model to estimate the user's emotional state. The input is the feature data extracted in step 2, and the output is categorical information of the emotional state (e.g., joy, anxiety, etc.).

[0424] Step 4:

[0425] The server uses a generative AI model to create personalized support based on the estimated emotional state. Prompts are used to formulate specific advice for the generative AI to alleviate the user's emotional state. The input is the emotional state result from step 3 and historical information, and the output is the support content based on that emotion.

[0426] Step 5:

[0427] The server notifies the user's device of the generated support content. The input is the support content created in step 4, and the output is a suggestion of specific actions to be taken on the user's smartphone.

[0428] Step 6:

[0429] Users respond to the support provided. Feedback based on this is collected and sent back to the server. This feedback information is used for the continuous improvement of the system. The input is user feedback, and the output is the accumulation of feedback data.

[0430] This process allows the system to gain a detailed understanding of the user's emotional state and provide more appropriate and personalized support.

[0431] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0432] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0433] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0434] [Third Embodiment]

[0435] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0436] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0437] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0438] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0439] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0440] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0441] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0442] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0443] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0444] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0445] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0446] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0447] This invention is a system that utilizes the user's voice and biometric data to monitor their emotional state and analyze it in real time, thereby providing the user with appropriate advice. An embodiment of this system is shown below.

[0448] Data collection and analysis

[0449] The device uses microphones and sensors to acquire voice and biometric data when the user uses the device. The acquired data is immediately transmitted to a server using a secure protocol. The server converts the received voice data into text using natural language processing algorithms and analyzes what the user is saying and the tone of their voice. At the same time, it analyzes heart rate and changes in facial expressions based on biometric data to estimate the user's emotional state.

[0450] Estimation of emotional state and generation of advice

[0451] The server identifies the user's emotional state based on the analysis results. For example, if the analysis determines that the user is likely experiencing stress, the server uses this information and generative AI technology to generate specific advice for relaxation. This advice is customized to the individual user's preferences and past feedback.

[0452] Advice notification and feedback collection

[0453] The generated advice is communicated to the user via their device. The notification method is tailored to the user's device type and is designed to non-intrusively support their daily activities, such as through pop-up screens or voice guidance. The user's response is sent back to the server as feedback from the device and used to improve future advice generation. This enhances the overall accuracy and performance of the system.

[0454] These features allow users to receive support tailored to their individual circumstances, enabling them to effectively reduce stress and manage their emotions.

[0455] The following describes the processing flow.

[0456] Step 1:

[0457] The device collects voice data using a microphone while the user is using the device, and acquires biometric data such as heart rate and facial expressions using sensors. This data is temporarily stored within the device.

[0458] Step 2:

[0459] The device transmits acquired voice and biometric data to the server in real time. Secure communication protocols such as HTTPS are used for data transfer to protect user privacy.

[0460] Step 3:

[0461] The server converts the received audio data into text using natural language processing (NLP) algorithms and analyzes the user's vocabulary and tone of voice. This generates foundational data for evaluating the user's psychological state.

[0462] Step 4:

[0463] The server simultaneously analyzes biometric data, estimating stress levels and emotional changes from the user's heart rate and facial expressions. The analysis results are integrated with other emotional indicators to identify the user's overall emotional state.

[0464] Step 5:

[0465] The server generates advice tailored to the user's state based on an analysis of their emotional state. Using generation AI technology, the advice is personalized, taking into account the user's past feedback and preferences.

[0466] Step 6:

[0467] The server sends the generated advice to the terminal.

[0468] Step 7:

[0469] The device notifies the user of advice received from the server. These notifications appear as pop-up messages or voice guidance on the device, taking care not to interrupt the user's daily activities.

[0470] Step 8:

[0471] The device records the user's response to advice and subsequent actions as feedback. This information is then sent back to the server and used to generate future advice.

[0472] This processing flow allows the system to monitor the user's emotional state and provide appropriate support.

[0473] (Example 1)

[0474] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0475] In modern society, understanding the stress and other emotional states experienced by individual users in real time and providing timely, appropriate advice is a major challenge. However, conventional technologies have not been able to fully utilize voice and biometric data, making it difficult to provide personalized support to users. Furthermore, existing systems have low accuracy in information processing, resulting in insufficient advice that accurately reflects the user's state.

[0476] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0477] In this invention, the server includes means for acquiring the user's voice and vital data, means for analyzing the acquired data in real time and estimating the emotional state, and means for generating appropriate advice for the user based on the emotional state. This makes it possible to accurately grasp the user's emotional state and quickly provide personalized advice according to that state.

[0478] "Audio data" refers to a digital recording of a user's voice.

[0479] "Vital data" refers to biometric information such as the user's heart rate and skin electrical activity.

[0480] "Real-time" refers to a process where the time between data acquisition and analysis is extremely short, instantly reflecting the user's current state.

[0481] "Emotional state" refers to information that indicates the psychological state or mood that the user is experiencing.

[0482] "Advice" refers to specific suggestions and instructions provided to users to reduce stress and achieve emotional stability.

[0483] A "speech recognition algorithm" is a technology that analyzes speech data and converts it into text.

[0484] "Natural language processing" refers to technologies that enable machines to understand, analyze, and generate human speech and text.

[0485] "Transcription" refers to the process of converting audio information into written text.

[0486] "Feedback" refers to the reactions or opinions that users give to advice they receive.

[0487] "Machine learning techniques" are algorithms that allow computers to learn from large amounts of data and derive more accurate results.

[0488] This invention is a system that provides personalized advice to users by monitoring their emotional state using their voice and biometric data and analyzing it in real time.

[0489] Specifically, the device is equipped with a microphone to capture voice input and sensors to detect heart rate, skin electrical activity, and other parameters. The device continues to collect this data while the user performs their normal daily activities. The voice data is not processed on-site but is immediately sent to the server using a secure protocol (e.g., TLS / SSL).

[0490] The server first converts the audio data into text using a speech recognition algorithm (e.g., a commercial speech recognition API). This process includes noise reduction and volume adjustment, and the converted text is then analyzed using natural language processing (NLP) techniques. This makes it possible to understand the content of the user's speech and the nuances of their emotions.

[0491] Simultaneously, the server analyzes biometric data such as heart rate and facial expression changes to estimate the user's current emotional state. For example, an increase in heart rate or changes in facial expression can indicate that the user is experiencing stress.

[0492] Based on the analyzed data, the server understands the user's emotional state and uses a generative AI model (e.g., a general natural language processing model) to generate advice appropriate to the user's situation. This advice is generated based on the generated prompt text. An example of a prompt text would be: "The user's voice tone is calm, and their heart rate is within the normal range. The user is in a meeting. Please suggest techniques to help them stay relaxed."

[0493] The generated advice is notified to the user via their device. On smartphones, the advice is displayed as a pop-up notification. On smart speakers, it is provided via voice. Users can follow this advice to manage stress and maintain focus.

[0494] User feedback is sent back to the server via the device and used to improve the entire system based on the accumulated data. Machine learning techniques that take feedback into account enable continuous improvement in the quality of advice.

[0495] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0496] Step 1:

[0497] The device acquires the user's voice and vital data. In this process, the device's microphone is used to record the user's conversation and vocalizations in real time, while sensors are used to simultaneously record vital data such as heart rate and skin electrical activity. At this point, the inputs are voice and vital data, and the output is a digital dataset of these.

[0498] Step 2:

[0499] The terminal transmits collected voice and vital data to the server using a secure protocol. The data is encrypted before transmission and processed to prevent information leakage. The input is the voice and vital data acquired by the terminal, and the output is the dataset received by the server.

[0500] Step 3:

[0501] The server converts the received audio data into text using a speech recognition algorithm. The input is audio data, and the output is text data. This process includes audio preprocessing to remove audio noise and ensure accurate text conversion.

[0502] Step 4:

[0503] The server analyzes the transcribed speech using natural language processing techniques to understand the user's language patterns and emotional nuances. The input consists of text and intonation information generated by speech recognition, while the output is an analysis of the user's emotional state. The analysis process also includes an evaluation of emotional tone using a language model.

[0504] Step 5:

[0505] The server analyzes biometric data and supplements it with data on the user's emotional state. Inputs are vital data such as heart rate and skin electrical activity, and outputs are emotional assessments such as psychological stress and relaxation levels. The analysis uses pattern recognition algorithms to detect outliers and abnormal patterns.

[0506] Step 6:

[0507] The server uses a generative AI model to generate advice based on the analysis results. The input is the analyzed emotional state, and the prompt includes the instruction, "The user's heart rate is elevated, suggesting they may be stressed. Please generate advice to help them feel more at ease." The output is specific advice provided to the user.

[0508] Step 7:

[0509] The terminal notifies the user of the generated advice. The input is the advice generated by the server, and the output is the notification information delivered to the user (e.g., voice guidance or pop-up message). Specifically, an appropriate display method is used depending on the type of device.

[0510] Step 8:

[0511] Users act based on the advice provided and provide feedback as needed. This feedback is sent from the terminal to the server and used for future system improvements. The input is user feedback information, and the output is improvement data used to generate future advice.

[0512] (Application Example 1)

[0513] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0514] In care settings for the elderly and those requiring care, appropriate care and support tailored to individual needs are required. However, currently, the burden on care staff is significant, making it difficult to respond immediately to the emotional state of all users. This project aims to solve these problems, improve the quality of life for users, and reduce the burden on care staff.

[0515] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0516] In this invention, the server includes means for acquiring the user's voice and biometric data, means for analyzing the acquired data in real time and estimating the emotional state, means for generating appropriate advice for the user based on the emotional state, means for understanding the emotional state in a care setting and proposing individualized care and relaxation methods, means for notifying the user of the generated advice on the user's information processing device, and means for collecting user feedback and using it to improve information processing technology. This makes it possible to accurately monitor the emotional state of users in care settings and provide individually appropriate care.

[0517] "Audio data" refers to sound information collected using a receiving device, which is the content of what the user is saying.

[0518] "Biometric data" refers to information that indicates the user's physical condition, including physiological indicators such as heart rate and facial expressions.

[0519] "Emotional state" refers to a state that indicates the user's emotional condition, and includes psychological tendencies such as stress and relaxation.

[0520] "Advice" refers to specific suggestions or instructions provided based on the user's emotional state, with the aim of solving problems or improving their condition.

[0521] An "information processing device" is a device that processes digital data and provides information and instructions to users, and includes electronic devices such as smartphones and tablets.

[0522] A "server" is a central device that processes information and data on a network, and is a computer that is responsible for analyzing and processing data sent by users.

[0523] "Feedback" refers to a user's response to advice and suggestions, and is information used to improve the system.

[0524] "Information processing technology" refers to technologies used to efficiently analyze and process large amounts of data, and includes natural language processing and machine learning.

[0525] The system that realizes this invention consists of an information processing device and a server. The information processing device used by the user is equipped with sensors for acquiring voice and biometric data. This device immediately transmits the voice and biometric data collected using the sensors to the server.

[0526] The server converts the collected audio data into text using natural language processing algorithms. Specifically, it uses Amazon Transcribe from AWS to convert audio into text information, and then analyzes the content and tone of the audio using Amazon Comprehend. Simultaneously, biometric data is used to estimate emotional states via Amazon SageMaker. This allows for real-time analysis of the user's emotional state.

[0527] Based on the analysis results, the server uses AI technology to generate advice tailored to the user's emotional state. This generated advice is then customized to individual needs by Amazon Personalize and communicated to the information processing device via Amazon SNS. This notification is designed to be easily understood and delivered in a non-invasive manner.

[0528] As a concrete example, in a caregiving setting, if signs of anxiety are detected in the user, a voice command such as "Shall we try taking a deep breath?" is provided through the information processing device. Furthermore, an example of a prompt sentence to be input to the generating AI model would be, "Please suggest actions to take when the user's emotional state is identified as anxiety."

[0529] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0530] Step 1:

[0531] The device uses a microphone and biosensors to collect the user's voice and biometric data. This collected data serves as input. The voice data is digitized, and biometric data such as heart rate and facial expression parameters are obtained. This data is transmitted to the server in real time.

[0532] Step 2:

[0533] The server converts the received audio data into text using a natural language processing algorithm. Specifically, the server uses Amazon Transcribe from AWS to convert the audio data into text information. As a result, the content of the audio is output as string data.

[0534] Step 3:

[0535] The server analyzes the content and tone of the transcribed audio data. Amazon Comprehend is used to extract the user's emotions and intentions from the text data. This analysis generates patterns that indicate emotional tendencies.

[0536] Step 4:

[0537] Simultaneously, the server estimates the emotional state based on biometric data. Using Amazon SageMaker, it analyzes heart rate and facial expression data to evaluate the user's stress and relaxation levels. This process outputs an index indicating the emotional state.

[0538] Step 5:

[0539] The server utilizes an AI model based on the analysis results to generate optimal advice for the user. It suggests relaxation methods and care tailored to the user's emotional state, and the generated advice is output.

[0540] Step 6:

[0541] The generated advice is customized to the individual user's needs using Amazon Personalize. Advice is adaptively optimized by considering user history and feedback.

[0542] Step 7:

[0543] Advice is sent to the device via Amazon SNS, allowing the user to review the recommended actions. The final output is provided as text or audio instructions on the device.

[0544] Step 8:

[0545] The user responds to the advice they receive, and this feedback is sent back from the device to the server. The server uses this feedback to improve the accuracy of future advice generation.

[0546] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0547] This invention provides a system for precisely identifying a user's emotional state and providing appropriate support. By incorporating an emotion engine, it accurately recognizes emotions from the user's voice and biometric data, enabling a personalized approach.

[0548] The role of collecting emotional data and the emotional engine

[0549] The device utilizes microphones and various sensors to collect user voice and biometric data with high accuracy. The collected data is transmitted to a server in real time. The emotion engine on the server analyzes this data and uses a pre-trained emotion model to identify the user's emotional state. This model is trained to understand various emotion categories (e.g., joy, sadness, anger, surprise, etc.).

[0550] Enhanced emotion recognition and advice generation

[0551] The server uses generative AI technology to create personalized advice for the user based on the emotional state obtained by the emotion engine. For example, if the user is determined to be stressed, a specific action plan for relaxation will be presented. This plan is customized by taking into account feedback on the user's past activities and responses.

[0552] Feedback loops and system learning

[0553] The device continuously records the user's responses to advice. This feedback data is sent to a server and used in machine learning algorithms to help update the emotion engine model. Through this feedback loop, the system improves the accuracy of its emotion recognition, enabling the delivery of more personalized advice.

[0554] These configurations create an efficient and intelligent support system designed to assist users' psychological well-being and reduce daily stress.

[0555] The following describes the processing flow.

[0556] Step 1:

[0557] The device uses a microphone to acquire voice data while the user operates it, and sensors to collect biometric data such as heart rate and skin temperature. This data is temporarily stored in real time.

[0558] Step 2:

[0559] The device transfers the collected voice and biometric data to the server via a secure communication protocol (e.g., HTTPS). This process is encrypted to ensure user privacy.

[0560] Step 3:

[0561] The server converts the received audio data into text using natural language processing (NLP) technology and meticulously analyzes the content of the speech and the tone of voice. This analysis extracts patterns that reflect the user's emotions.

[0562] Step 4:

[0563] The server utilizes an emotion engine to identify the user's emotional state based on analyzed voice and biometric data. The engine uses a pre-trained emotion model to determine which category the emotion belongs to (e.g., joy, sadness, anger).

[0564] Step 5:

[0565] The server uses generative AI to generate appropriate advice for the user based on the analysis results from the emotion engine. This advice is personalized by taking into account past user data and feedback information.

[0566] Step 6:

[0567] The server then sends the generated advice to the terminal.

[0568] Step 7:

[0569] The device notifies the user of any advice it receives. These notifications appear as on-screen pop-up messages or voice messages, carefully designed to avoid interrupting the user's current activities.

[0570] Step 8:

[0571] The device records how the user responds to the advice. This response data is sent to the server as feedback.

[0572] Step 9:

[0573] The server analyzes the collected feedback and applies it to machine learning algorithms to update the emotion engine model. This continuous learning process improves the system's emotion recognition accuracy and the personalization of its advice.

[0574] (Example 2)

[0575] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0576] While there is a need to accurately recognize users' emotional states and provide appropriate advice, conventional systems have insufficient accuracy in recognizing emotional states, resulting in low-quality personalized support. Furthermore, they lack effective means of utilizing user feedback to improve system accuracy. Therefore, there is a need for an effective system that supports users' psychological well-being and reduces daily stress.

[0577] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0578] In this invention, the server includes means for acquiring the user's voice and biometric data, means for analyzing the acquired data in real time and estimating the emotional state, and means for generating appropriate advice for the user based on the emotional state. This enables high-precision recognition of the user's emotional state and the provision of personalized advice tailored to individual circumstances.

[0579] A "user" is an individual who uses the system to gain recognition of their own emotional state and receive advice.

[0580] "Voice and biometric data" refers to voice recordings necessary to recognize the user's emotional state, as well as data showing physiological indicators such as heart rate, body temperature, and skin conductivity.

[0581] "Real-time analysis" refers to a process where data is processed immediately after acquisition, resulting in instant results.

[0582] "Emotional state" refers to the psychological and emotional state a user experiences at a given moment, and includes categories such as joy, sadness, anger, and surprise.

[0583] "Appropriate advice" refers to suggestions for specific actions to improve or support the user, tailored to their current emotional state.

[0584] "Feedback" refers to data that records users' reactions and results to advice provided by the system, and is used to improve the system's accuracy.

[0585] A "machine learning algorithm" is a computational method that automatically learns patterns and rules from data to improve the performance of a system.

[0586] A "generative AI model" is an artificial intelligence technology that generates information and actions for a specific purpose from a large amount of data.

[0587] A "prompt" is a text input given to a generative AI model to generate a specific output.

[0588] This system is designed to accurately identify the user's emotions and provide personalized advice. An embodiment of this system is shown below.

[0589] The device collects the user's voice and biometric data using a microphone and various sensors. Specifically, it records voice and monitors heart rate, body temperature, and skin electrical activity. This data is collected in real time and immediately transmitted to the server.

[0590] The server runs analysis software called an emotion engine. This emotion engine analyzes data sent by the user using an emotion model that has been previously trained using machine learning algorithms. This allows for the analysis of the linguistic and phonological features of the data, making it possible to estimate the user's emotional state with high accuracy. Emotional states are expressed in categories such as joy, sadness, anger, and surprise.

[0591] Based on these analysis results, the server uses a generative AI model to generate personalized advice for the user. The generative AI model generates advice based on prompts to suggest the most suitable action plan for the user's emotional state. For example, by inputting the prompt "Suggest ways to relax when the user is feeling stressed," the server can generate specific advice such as "Take three deep breaths" or "Take a short walk."

[0592] The device notifies the user of advice transmitted from the server through a voice assistant or screen display. By following the advice provided, the user can reduce daily stress.

[0593] Furthermore, the device records the user's response to advice and sends that feedback to the server. Based on this feedback, the server continuously updates the emotion engine model. As a result, the system evolves with each use, providing more accurate and user-friendly support.

[0594] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0595] Step 1:

[0596] The user provides data through the device's voice input device and biosensors. The device records voice data and collects biometric data (heart rate, body temperature, skin conductivity, etc.) based on this data. This input data is transmitted to the server in real time.

[0597] Step 2:

[0598] The server receives voice and biometric data transmitted from the terminal. This data is passed to the emotion engine as input, where its linguistic and phonological features are analyzed. This analysis generates an output that estimates the user's emotional state (e.g., joy, sadness).

[0599] Step 3:

[0600] The server inputs the estimated emotional state into a generating AI model and uses prompts to generate user-specific advice. Specifically, if the emotional state is "stressed," it will output advice such as "Take three deep breaths" based on the prompt "Please suggest ways to relax."

[0601] Step 4:

[0602] The device receives advice sent from the server. The device notifies the user of this advice using a voice assistant or display. The user takes action according to the provided advice.

[0603] Step 5:

[0604] The user acts based on the advice and then provides feedback to the device. The device collects this feedback and sends it back to the server.

[0605] Step 6:

[0606] The server uses feedback data sent from the terminal to update the emotion engine model through machine learning. This update enables the model to produce more accurate emotion recognition and personalized advice in subsequent uses.

[0607] (Application Example 2)

[0608] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0609] Maintaining the mental health of the elderly is a major challenge in modern society. In particular, feelings of isolation and increased stress are factors that reduce the quality of life for the elderly. The present invention aims to provide an efficient support system to alleviate the anxiety and stress that the elderly experience in their daily lives and to provide them with psychological peace of mind.

[0610] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0611] In this invention, the server includes a device for acquiring the user's voice and biometric data, a device for analyzing the acquired data in real time and estimating the emotional state, and a device for generating personalized support based on the emotional state. This makes it possible to identify the user's emotional state in detail and propose appropriate relaxation or social interaction to support the psychological health of the elderly.

[0612] "Users" refer to anyone who uses the system directly or indirectly. This usually includes elderly people.

[0613] "Voice and biometric data" refers to voice information obtained from the user, as well as physiological indicators such as heart rate and skin electrical activity.

[0614] "Device" refers to hardware and software components for acquiring, analyzing, and notifying voice and biometric data.

[0615] "Analyzing in real time and estimating emotional state" refers to the process of immediately processing collected voice and biometric data to evaluate the user's emotional state.

[0616] "Generating personalized support" means creating advice and activity suggestions tailored to the individual needs and history of the user, based on their emotional state.

[0617] "Psychological health" refers to a state in which the user's mental state is stable and they are free from stress and anxiety.

[0618] "Offering relaxation or social interaction" means presenting users with options for activities that calm the mind or promote communication with others.

[0619] The system used to implement this application analyzes the emotional state of elderly individuals using voice and biometric data and provides appropriate support. The server utilizes a smartphone's microphone to collect voice data and a heart rate sensor to acquire biometric data. Voice data is processed using the "LibROSA" library, and the "Emotion-recognition-using-speech" model is used for emotion identification. Biometric data is organized and analyzed using "Pandas" and "SciKit-learn". Based on the identified emotional state, the server creates personalized support using generative AI technology. This also takes into account the user's past history. This support is notified to the user's smartphone, and specific behavioral guidance is provided.

[0620] The server collects user feedback and uses it to improve the system. This enables the use of machine learning algorithms and accelerates the updating of the model, allowing for even more personalized assistance.

[0621] For example, if the system analyzes that a user is experiencing anxiety, the server will notify them with support such as, "Please do a 30-minute deep breathing exercise. We will also play your favorite music from a list." An example of a prompt message would be, "Please suggest activities and exercises to alleviate the anxiety the user is feeling. Based on their past activity history, please also suggest music the user might like." In this way, this system is a concrete and effective implementation for supporting the psychological health of the elderly.

[0622] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0623] Step 1:

[0624] The device uses the smartphone's microphone to collect voice data and records the user's voice in real time. It also acquires biometric data (e.g., heart rate) using a heart rate sensor. In this step, the user's voice and biometric data are obtained as input, which are then sent to a subsequent analysis process.

[0625] Step 2:

[0626] The server processes the collected audio data using "LibROSA" to extract audio features. Biometric data is formatted using "Pandas". The input is the audio and biometric data collected in step 1, and the output is feature data in an analyzable format.

[0627] Step 3:

[0628] The server inputs speech feature data into the "Emotion-recognition-using-speech" model to estimate the user's emotional state. The input is the feature data extracted in step 2, and the output is categorical information of the emotional state (e.g., joy, anxiety, etc.).

[0629] Step 4:

[0630] The server uses a generative AI model to create personalized support based on the estimated emotional state. Prompts are used to formulate specific advice for the generative AI to alleviate the user's emotional state. The input is the emotional state result from step 3 and historical information, and the output is the support content based on that emotion.

[0631] Step 5:

[0632] The server notifies the user's device of the generated support content. The input is the support content created in step 4, and the output is a suggestion of specific actions to be taken on the user's smartphone.

[0633] Step 6:

[0634] Users respond to the support provided. Feedback based on this is collected and sent back to the server. This feedback information is used for the continuous improvement of the system. The input is user feedback, and the output is the accumulation of feedback data.

[0635] This process allows the system to gain a detailed understanding of the user's emotional state and provide more appropriate and personalized support.

[0636] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0637] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0638] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0639] [Fourth Embodiment]

[0640] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0641] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0642] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0643] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0644] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0645] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0646] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0647] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0648] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0649] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0650] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0651] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0652] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0653] This invention is a system that utilizes the user's voice and biometric data to monitor their emotional state and analyze it in real time, thereby providing the user with appropriate advice. An embodiment of this system is shown below.

[0654] Data collection and analysis

[0655] The device uses microphones and sensors to acquire voice and biometric data when the user uses the device. The acquired data is immediately transmitted to a server using a secure protocol. The server converts the received voice data into text using natural language processing algorithms and analyzes what the user is saying and the tone of their voice. At the same time, it analyzes heart rate and changes in facial expressions based on biometric data to estimate the user's emotional state.

[0656] Estimation of emotional state and generation of advice

[0657] The server identifies the user's emotional state based on the analysis results. For example, if the analysis determines that the user is likely experiencing stress, the server uses this information and generative AI technology to generate specific advice for relaxation. This advice is customized to the individual user's preferences and past feedback.

[0658] Advice notification and feedback collection

[0659] The generated advice is communicated to the user via their device. The notification method is tailored to the user's device type and is designed to non-intrusively support their daily activities, such as through pop-up screens or voice guidance. The user's response is sent back to the server as feedback from the device and used to improve future advice generation. This enhances the overall accuracy and performance of the system.

[0660] These features allow users to receive support tailored to their individual circumstances, enabling them to effectively reduce stress and manage their emotions.

[0661] The following describes the processing flow.

[0662] Step 1:

[0663] The device collects voice data using a microphone while the user is using the device, and acquires biometric data such as heart rate and facial expressions using sensors. This data is temporarily stored within the device.

[0664] Step 2:

[0665] The device transmits acquired voice and biometric data to the server in real time. Secure communication protocols such as HTTPS are used for data transfer to protect user privacy.

[0666] Step 3:

[0667] The server converts the received audio data into text using natural language processing (NLP) algorithms and analyzes the user's vocabulary and tone of voice. This generates foundational data for evaluating the user's psychological state.

[0668] Step 4:

[0669] The server simultaneously analyzes biometric data, estimating stress levels and emotional changes from the user's heart rate and facial expressions. The analysis results are integrated with other emotional indicators to identify the user's overall emotional state.

[0670] Step 5:

[0671] The server generates advice tailored to the user's state based on an analysis of their emotional state. Using generation AI technology, the advice is personalized, taking into account the user's past feedback and preferences.

[0672] Step 6:

[0673] The server sends the generated advice to the terminal.

[0674] Step 7:

[0675] The device notifies the user of advice received from the server. These notifications appear as pop-up messages or voice guidance on the device, taking care not to interrupt the user's daily activities.

[0676] Step 8:

[0677] The device records the user's response to advice and subsequent actions as feedback. This information is then sent back to the server and used to generate future advice.

[0678] This processing flow allows the system to monitor the user's emotional state and provide appropriate support.

[0679] (Example 1)

[0680] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0681] In modern society, understanding the stress and other emotional states experienced by individual users in real time and providing timely, appropriate advice is a major challenge. However, conventional technologies have not been able to fully utilize voice and biometric data, making it difficult to provide personalized support to users. Furthermore, existing systems have low accuracy in information processing, resulting in insufficient advice that accurately reflects the user's state.

[0682] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0683] In this invention, the server includes means for acquiring the user's voice and vital data, means for analyzing the acquired data in real time and estimating the emotional state, and means for generating appropriate advice for the user based on the emotional state. This makes it possible to accurately grasp the user's emotional state and quickly provide personalized advice according to that state.

[0684] "Audio data" refers to a digital recording of a user's voice.

[0685] "Vital data" refers to biometric information such as the user's heart rate and skin electrical activity.

[0686] "Real-time" refers to a process where the time between data acquisition and analysis is extremely short, instantly reflecting the user's current state.

[0687] "Emotional state" refers to information that indicates the psychological state or mood that the user is experiencing.

[0688] "Advice" refers to specific suggestions and instructions provided to users to reduce stress and achieve emotional stability.

[0689] A "speech recognition algorithm" is a technology that analyzes speech data and converts it into text.

[0690] "Natural language processing" refers to technologies that enable machines to understand, analyze, and generate human speech and text.

[0691] "Transcription" refers to the process of converting audio information into written text.

[0692] "Feedback" refers to the reactions or opinions that users give to advice they receive.

[0693] "Machine learning techniques" are algorithms that allow computers to learn from large amounts of data and derive more accurate results.

[0694] This invention is a system that provides personalized advice to users by monitoring their emotional state using their voice and biometric data and analyzing it in real time.

[0695] Specifically, the device is equipped with a microphone to capture voice input and sensors to detect heart rate, skin electrical activity, and other parameters. The device continues to collect this data while the user performs their normal daily activities. The voice data is not processed on-site but is immediately sent to the server using a secure protocol (e.g., TLS / SSL).

[0696] The server first converts the audio data into text using a speech recognition algorithm (e.g., a commercial speech recognition API). This process includes noise reduction and volume adjustment, and the converted text is then analyzed using natural language processing (NLP) techniques. This makes it possible to understand the content of the user's speech and the nuances of their emotions.

[0697] Simultaneously, the server analyzes biometric data such as heart rate and facial expression changes to estimate the user's current emotional state. For example, an increase in heart rate or changes in facial expression can indicate that the user is experiencing stress.

[0698] Based on the analyzed data, the server understands the user's emotional state and uses a generative AI model (e.g., a general natural language processing model) to generate advice appropriate to the user's situation. This advice is generated based on the generated prompt text. An example of a prompt text would be: "The user's voice tone is calm, and their heart rate is within the normal range. The user is in a meeting. Please suggest techniques to help them stay relaxed."

[0699] The generated advice is notified to the user via their device. On smartphones, the advice is displayed as a pop-up notification. On smart speakers, it is provided via voice. Users can follow this advice to manage stress and maintain focus.

[0700] User feedback is sent back to the server via the device and used to improve the entire system based on the accumulated data. Machine learning techniques that take feedback into account enable continuous improvement in the quality of advice.

[0701] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0702] Step 1:

[0703] The device acquires the user's voice and vital data. In this process, the device's microphone is used to record the user's conversation and vocalizations in real time, while sensors are used to simultaneously record vital data such as heart rate and skin electrical activity. At this point, the inputs are voice and vital data, and the output is a digital dataset of these.

[0704] Step 2:

[0705] The terminal transmits collected voice and vital data to the server using a secure protocol. The data is encrypted before transmission and processed to prevent information leakage. The input is the voice and vital data acquired by the terminal, and the output is the dataset received by the server.

[0706] Step 3:

[0707] The server converts the received audio data into text using a speech recognition algorithm. The input is audio data, and the output is text data. This process includes audio preprocessing to remove audio noise and ensure accurate text conversion.

[0708] Step 4:

[0709] The server analyzes the transcribed speech using natural language processing techniques to understand the user's language patterns and emotional nuances. The input consists of text and intonation information generated by speech recognition, while the output is an analysis of the user's emotional state. The analysis process also includes an evaluation of emotional tone using a language model.

[0710] Step 5:

[0711] The server analyzes biometric data and supplements it with data on the user's emotional state. Inputs are vital data such as heart rate and skin electrical activity, and outputs are emotional assessments such as psychological stress and relaxation levels. The analysis uses pattern recognition algorithms to detect outliers and abnormal patterns.

[0712] Step 6:

[0713] The server uses a generative AI model to generate advice based on the analysis results. The input is the analyzed emotional state, and the prompt includes the instruction, "The user's heart rate is elevated, suggesting they may be stressed. Please generate advice to help them feel more at ease." The output is specific advice provided to the user.

[0714] Step 7:

[0715] The terminal notifies the user of the generated advice. The input is the advice generated by the server, and the output is the notification information delivered to the user (e.g., voice guidance or pop-up message). Specifically, an appropriate display method is used depending on the type of device.

[0716] Step 8:

[0717] Users act based on the advice provided and provide feedback as needed. This feedback is sent from the terminal to the server and used for future system improvements. The input is user feedback information, and the output is improvement data used to generate future advice.

[0718] (Application Example 1)

[0719] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0720] In care settings for the elderly and those requiring care, appropriate care and support tailored to individual needs are required. However, currently, the burden on care staff is significant, making it difficult to respond immediately to the emotional state of all users. This project aims to solve these problems, improve the quality of life for users, and reduce the burden on care staff.

[0721] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0722] In this invention, the server includes means for acquiring the user's voice and biometric data, means for analyzing the acquired data in real time and estimating the emotional state, means for generating appropriate advice for the user based on the emotional state, means for understanding the emotional state in a care setting and proposing individualized care and relaxation methods, means for notifying the user of the generated advice on the user's information processing device, and means for collecting user feedback and using it to improve information processing technology. This makes it possible to accurately monitor the emotional state of users in care settings and provide individually appropriate care.

[0723] "Audio data" refers to sound information collected using a receiving device, which is the content of what the user is saying.

[0724] "Biometric data" refers to information that indicates the user's physical condition, including physiological indicators such as heart rate and facial expressions.

[0725] "Emotional state" refers to a state that indicates the user's emotional condition, and includes psychological tendencies such as stress and relaxation.

[0726] "Advice" refers to specific suggestions or instructions provided based on the user's emotional state, with the aim of solving problems or improving their condition.

[0727] An "information processing device" is a device that processes digital data and provides information and instructions to users, and includes electronic devices such as smartphones and tablets.

[0728] A "server" is a central device that processes information and data on a network, and is a computer that is responsible for analyzing and processing data sent by users.

[0729] "Feedback" refers to a user's response to advice and suggestions, and is information used to improve the system.

[0730] "Information processing technology" refers to technologies used to efficiently analyze and process large amounts of data, and includes natural language processing and machine learning.

[0731] The system that realizes this invention consists of an information processing device and a server. The information processing device used by the user is equipped with sensors for acquiring voice and biometric data. This device immediately transmits the voice and biometric data collected using the sensors to the server.

[0732] The server converts the collected audio data into text using natural language processing algorithms. Specifically, it uses Amazon Transcribe from AWS to convert audio into text information, and then analyzes the content and tone of the audio using Amazon Comprehend. Simultaneously, biometric data is used to estimate emotional states via Amazon SageMaker. This allows for real-time analysis of the user's emotional state.

[0733] Based on the analysis results, the server uses AI technology to generate advice tailored to the user's emotional state. This generated advice is then customized to individual needs by Amazon Personalize and communicated to the information processing device via Amazon SNS. This notification is designed to be easily understood and delivered in a non-invasive manner.

[0734] As a concrete example, in a caregiving setting, if signs of anxiety are detected in the user, a voice command such as "Shall we try taking a deep breath?" is provided through the information processing device. Furthermore, an example of a prompt sentence to be input to the generating AI model would be, "Please suggest actions to take when the user's emotional state is identified as anxiety."

[0735] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0736] Step 1:

[0737] The device uses a microphone and biosensors to collect the user's voice and biometric data. This collected data serves as input. The voice data is digitized, and biometric data such as heart rate and facial expression parameters are obtained. This data is transmitted to the server in real time.

[0738] Step 2:

[0739] The server converts the received audio data into text using a natural language processing algorithm. Specifically, the server uses Amazon Transcribe from AWS to convert the audio data into text information. As a result, the content of the audio is output as string data.

[0740] Step 3:

[0741] The server analyzes the content and tone of the transcribed audio data. Amazon Comprehend is used to extract the user's emotions and intentions from the text data. This analysis generates patterns that indicate emotional tendencies.

[0742] Step 4:

[0743] Simultaneously, the server estimates the emotional state based on biometric data. Using Amazon SageMaker, it analyzes heart rate and facial expression data to evaluate the user's stress and relaxation levels. This process outputs an index indicating the emotional state.

[0744] Step 5:

[0745] The server utilizes an AI model based on the analysis results to generate optimal advice for the user. It suggests relaxation methods and care tailored to the user's emotional state, and the generated advice is output.

[0746] Step 6:

[0747] The generated advice is customized to the individual user's needs using Amazon Personalize. Advice is adaptively optimized by considering user history and feedback.

[0748] Step 7:

[0749] Advice is sent to the device via Amazon SNS, allowing the user to review the recommended actions. The final output is provided as text or audio instructions on the device.

[0750] Step 8:

[0751] The user responds to the advice they receive, and this feedback is sent back from the device to the server. The server uses this feedback to improve the accuracy of future advice generation.

[0752] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0753] This invention provides a system for precisely identifying a user's emotional state and providing appropriate support. By incorporating an emotion engine, it accurately recognizes emotions from the user's voice and biometric data, enabling a personalized approach.

[0754] The role of collecting emotional data and the emotional engine

[0755] The device utilizes microphones and various sensors to collect user voice and biometric data with high accuracy. The collected data is transmitted to a server in real time. The emotion engine on the server analyzes this data and uses a pre-trained emotion model to identify the user's emotional state. This model is trained to understand various emotion categories (e.g., joy, sadness, anger, surprise, etc.).

[0756] Enhanced emotion recognition and advice generation

[0757] The server uses generative AI technology to create personalized advice for the user based on the emotional state obtained by the emotion engine. For example, if the user is determined to be stressed, a specific action plan for relaxation will be presented. This plan is customized by taking into account feedback on the user's past activities and responses.

[0758] Feedback loops and system learning

[0759] The device continuously records the user's responses to advice. This feedback data is sent to a server and used in machine learning algorithms to help update the emotion engine model. Through this feedback loop, the system improves the accuracy of its emotion recognition, enabling the delivery of more personalized advice.

[0760] These configurations create an efficient and intelligent support system designed to assist users' psychological well-being and reduce daily stress.

[0761] The following describes the processing flow.

[0762] Step 1:

[0763] The device uses a microphone to acquire voice data while the user operates it, and sensors to collect biometric data such as heart rate and skin temperature. This data is temporarily stored in real time.

[0764] Step 2:

[0765] The device transfers the collected voice and biometric data to the server via a secure communication protocol (e.g., HTTPS). This process is encrypted to ensure user privacy.

[0766] Step 3:

[0767] The server converts the received audio data into text using natural language processing (NLP) technology and meticulously analyzes the content of the speech and the tone of voice. This analysis extracts patterns that reflect the user's emotions.

[0768] Step 4:

[0769] The server utilizes an emotion engine to identify the user's emotional state based on analyzed voice and biometric data. The engine uses a pre-trained emotion model to determine which category the emotion belongs to (e.g., joy, sadness, anger).

[0770] Step 5:

[0771] The server uses generative AI to generate appropriate advice for the user based on the analysis results from the emotion engine. This advice is personalized by taking into account past user data and feedback information.

[0772] Step 6:

[0773] The server then sends the generated advice to the terminal.

[0774] Step 7:

[0775] The device notifies the user of any advice it receives. These notifications appear as on-screen pop-up messages or voice messages, carefully designed to avoid interrupting the user's current activities.

[0776] Step 8:

[0777] The device records how the user responds to the advice. This response data is sent to the server as feedback.

[0778] Step 9:

[0779] The server analyzes the collected feedback and applies it to machine learning algorithms to update the emotion engine model. This continuous learning process improves the system's emotion recognition accuracy and the personalization of its advice.

[0780] (Example 2)

[0781] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0782] While there is a need to accurately recognize users' emotional states and provide appropriate advice, conventional systems have insufficient accuracy in recognizing emotional states, resulting in low-quality personalized support. Furthermore, they lack effective means of utilizing user feedback to improve system accuracy. Therefore, there is a need for an effective system that supports users' psychological well-being and reduces daily stress.

[0783] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0784] In this invention, the server includes means for acquiring the user's voice and biometric data, means for analyzing the acquired data in real time and estimating the emotional state, and means for generating appropriate advice for the user based on the emotional state. This enables high-precision recognition of the user's emotional state and the provision of personalized advice tailored to individual circumstances.

[0785] A "user" is an individual who uses the system to gain recognition of their own emotional state and receive advice.

[0786] "Voice and biometric data" refers to voice recordings necessary to recognize the user's emotional state, as well as data showing physiological indicators such as heart rate, body temperature, and skin conductivity.

[0787] "Real-time analysis" refers to a process where data is processed immediately after acquisition, resulting in instant results.

[0788] "Emotional state" refers to the psychological and emotional state a user experiences at a given moment, and includes categories such as joy, sadness, anger, and surprise.

[0789] "Appropriate advice" refers to suggestions for specific actions to improve or support the user, tailored to their current emotional state.

[0790] "Feedback" refers to data that records users' reactions and results to advice provided by the system, and is used to improve the system's accuracy.

[0791] A "machine learning algorithm" is a computational method that automatically learns patterns and rules from data to improve the performance of a system.

[0792] A "generative AI model" is an artificial intelligence technology that generates information and actions for a specific purpose from a large amount of data.

[0793] A "prompt" is a text input given to a generative AI model to generate a specific output.

[0794] This system is designed to accurately identify the user's emotions and provide personalized advice. An embodiment of this system is shown below.

[0795] The device collects the user's voice and biometric data using a microphone and various sensors. Specifically, it records voice and monitors heart rate, body temperature, and skin electrical activity. This data is collected in real time and immediately transmitted to the server.

[0796] The server runs analysis software called an emotion engine. This emotion engine analyzes data sent by the user using an emotion model that has been previously trained using machine learning algorithms. This allows for the analysis of the linguistic and phonological features of the data, making it possible to estimate the user's emotional state with high accuracy. Emotional states are expressed in categories such as joy, sadness, anger, and surprise.

[0797] Based on these analysis results, the server uses a generative AI model to generate personalized advice for the user. The generative AI model generates advice based on prompts to suggest the most suitable action plan for the user's emotional state. For example, by inputting the prompt "Suggest ways to relax when the user is feeling stressed," the server can generate specific advice such as "Take three deep breaths" or "Take a short walk."

[0798] The device notifies the user of advice transmitted from the server through a voice assistant or screen display. By following the advice provided, the user can reduce daily stress.

[0799] Furthermore, the device records the user's response to advice and sends that feedback to the server. Based on this feedback, the server continuously updates the emotion engine model. As a result, the system evolves with each use, providing more accurate and user-friendly support.

[0800] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0801] Step 1:

[0802] The user provides data through the device's voice input device and biosensors. The device records voice data and collects biometric data (heart rate, body temperature, skin conductivity, etc.) based on this data. This input data is transmitted to the server in real time.

[0803] Step 2:

[0804] The server receives voice and biometric data transmitted from the terminal. This data is passed to the emotion engine as input, where its linguistic and phonological features are analyzed. This analysis generates an output that estimates the user's emotional state (e.g., joy, sadness).

[0805] Step 3:

[0806] The server inputs the estimated emotional state into a generating AI model and uses prompts to generate user-specific advice. Specifically, if the emotional state is "stressed," it will output advice such as "Take three deep breaths" based on the prompt "Please suggest ways to relax."

[0807] Step 4:

[0808] The device receives advice sent from the server. The device notifies the user of this advice using a voice assistant or display. The user takes action according to the provided advice.

[0809] Step 5:

[0810] The user acts based on the advice and then provides feedback to the device. The device collects this feedback and sends it back to the server.

[0811] Step 6:

[0812] The server uses feedback data sent from the terminal to update the emotion engine model through machine learning. This update enables the model to produce more accurate emotion recognition and personalized advice in subsequent uses.

[0813] (Application Example 2)

[0814] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0815] Maintaining the mental health of the elderly is a major challenge in modern society. In particular, feelings of isolation and increased stress are factors that reduce the quality of life for the elderly. The present invention aims to provide an efficient support system to alleviate the anxiety and stress that the elderly experience in their daily lives and to provide them with psychological peace of mind.

[0816] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0817] In this invention, the server includes a device for acquiring the user's voice and biometric data, a device for analyzing the acquired data in real time and estimating the emotional state, and a device for generating personalized support based on the emotional state. This makes it possible to identify the user's emotional state in detail and propose appropriate relaxation or social interaction to support the psychological health of the elderly.

[0818] "Users" refer to anyone who uses the system directly or indirectly. This usually includes elderly people.

[0819] "Voice and biometric data" refers to voice information obtained from the user, as well as physiological indicators such as heart rate and skin electrical activity.

[0820] "Device" refers to hardware and software components for acquiring, analyzing, and notifying voice and biometric data.

[0821] "Analyzing in real time and estimating emotional state" refers to the process of immediately processing collected voice and biometric data to evaluate the user's emotional state.

[0822] "Generating personalized support" means creating advice and activity suggestions tailored to the individual needs and history of the user, based on their emotional state.

[0823] "Psychological health" refers to a state in which the user's mental state is stable and they are free from stress and anxiety.

[0824] "Offering relaxation or social interaction" means presenting users with options for activities that calm the mind or promote communication with others.

[0825] The system used to implement this application analyzes the emotional state of elderly individuals using voice and biometric data and provides appropriate support. The server utilizes a smartphone's microphone to collect voice data and a heart rate sensor to acquire biometric data. Voice data is processed using the "LibROSA" library, and the "Emotion-recognition-using-speech" model is used for emotion identification. Biometric data is organized and analyzed using "Pandas" and "SciKit-learn". Based on the identified emotional state, the server creates personalized support using generative AI technology. This also takes into account the user's past history. This support is notified to the user's smartphone, and specific behavioral guidance is provided.

[0826] The server collects user feedback and uses it to improve the system. This enables the use of machine learning algorithms and accelerates the updating of the model, allowing for even more personalized assistance.

[0827] For example, if the system analyzes that a user is experiencing anxiety, the server will notify them with support such as, "Please do a 30-minute deep breathing exercise. We will also play your favorite music from a list." An example of a prompt message would be, "Please suggest activities and exercises to alleviate the anxiety the user is feeling. Based on their past activity history, please also suggest music the user might like." In this way, this system is a concrete and effective implementation for supporting the psychological health of the elderly.

[0828] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0829] Step 1:

[0830] The device uses the smartphone's microphone to collect voice data and records the user's voice in real time. It also acquires biometric data (e.g., heart rate) using a heart rate sensor. In this step, the user's voice and biometric data are obtained as input, which are then sent to a subsequent analysis process.

[0831] Step 2:

[0832] The server processes the collected audio data using "LibROSA" to extract audio features. Biometric data is formatted using "Pandas". The input is the audio and biometric data collected in step 1, and the output is feature data in an analyzable format.

[0833] Step 3:

[0834] The server inputs speech feature data into the "Emotion-recognition-using-speech" model to estimate the user's emotional state. The input is the feature data extracted in step 2, and the output is categorical information of the emotional state (e.g., joy, anxiety, etc.).

[0835] Step 4:

[0836] The server uses a generative AI model to create personalized support based on the estimated emotional state. Prompts are used to formulate specific advice for the generative AI to alleviate the user's emotional state. The input is the emotional state result from step 3 and historical information, and the output is the support content based on that emotion.

[0837] Step 5:

[0838] The server notifies the user's device of the generated support content. The input is the support content created in step 4, and the output is a suggestion of specific actions to be taken on the user's smartphone.

[0839] Step 6:

[0840] Users respond to the support provided. Feedback based on this is collected and sent back to the server. This feedback information is used for the continuous improvement of the system. The input is user feedback, and the output is the accumulation of feedback data.

[0841] This process allows the system to gain a detailed understanding of the user's emotional state and provide more appropriate and personalized support.

[0842] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0843] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0844] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0845] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0846] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0847] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0848] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0849] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0850] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0851] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0852] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0853] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0854] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0855] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0856] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0857] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0858] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0859] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0860] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0861] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0862] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0863] The following is further disclosed regarding the embodiments described above.

[0864] (Claim 1)

[0865] Means for acquiring user voice and biometric data,

[0866] A means of analyzing acquired data in real time and estimating emotional states,

[0867] A means of generating appropriate advice for the user based on their emotional state,

[0868] A means of notifying the user of the generated advice on their device,

[0869] A means of collecting user feedback and using it to improve the system,

[0870] A system that includes this.

[0871] (Claim 2)

[0872] The system according to claim 1, comprising means for performing natural language processing on acquired audio data and analyzing language patterns and speech tones.

[0873] (Claim 3)

[0874] The system according to claim 1, further comprising means for recording the user's response to advice and reflecting it in future advice generation using a machine learning algorithm.

[0875] "Example 1"

[0876] (Claim 1)

[0877] A means for acquiring user voice and vital data,

[0878] A means of analyzing acquired data in real time and estimating emotional states,

[0879] A means of generating appropriate advice for the user based on their emotional state,

[0880] A means for notifying the user of the generated advice on their information processing device,

[0881] A means of collecting user feedback and using it to improve the system,

[0882] A means for converting acquired audio data into text using a speech recognition algorithm,

[0883] A means of analyzing text data using natural language processing technology,

[0884] When estimating a user's emotional state, methods based on biometric information such as heart rate and skin electrical activity are used,

[0885] A system that includes this.

[0886] (Claim 2)

[0887] The system according to claim 1, comprising means for performing natural language processing on acquired audio data and analyzing language patterns and speech intonation.

[0888] (Claim 3)

[0889] The system according to claim 1, comprising means for recording the user's response to advice and reflecting it in future advice generation using machine learning techniques.

[0890] "Application Example 1"

[0891] (Claim 1)

[0892] Means for acquiring user voice and biometric data,

[0893] A means of analyzing acquired data in real time and estimating emotional states,

[0894] A means of generating appropriate advice for the user based on their emotional state,

[0895] In caregiving settings, a means of understanding emotional states and proposing individualized care and relaxation methods,

[0896] A means for notifying the user of the generated advice on their information processing device,

[0897] A means of collecting user feedback and using it to improve information processing technology,

[0898] A system that includes this.

[0899] (Claim 2)

[0900] The system according to claim 1, comprising means for performing natural language processing on acquired audio data and analyzing language patterns and speech tones.

[0901] (Claim 3)

[0902] The system according to claim 1, further comprising means for recording the user's response to advice and reflecting it in future advice generation using a machine learning algorithm.

[0903] "Example 2 of combining an emotion engine"

[0904] (Claim 1)

[0905] Means for acquiring user voice and biometric data,

[0906] A means of analyzing acquired data in real time and estimating emotional states,

[0907] A means of generating appropriate advice for the user based on their emotional state,

[0908] A means of notifying the user of the generated advice on their device,

[0909] A means of collecting user feedback and using it to improve the system,

[0910] A method for improving the accuracy of advice by updating the sentiment model using machine learning algorithms based on feedback data,

[0911] A system that includes this.

[0912] (Claim 2)

[0913] The system according to claim 1, comprising means for performing natural language processing on acquired speech data and analyzing linguistic and phonological features.

[0914] (Claim 3)

[0915] The system according to claim 1, further comprising means for recording the user's response to advice and using the response to reflect it in future advice generation by a machine learning algorithm.

[0916] "Application example 2 when combining with an emotional engine"

[0917] (Claim 1)

[0918] A device that acquires the user's voice and biometric data,

[0919] A device that analyzes acquired data in real time and estimates emotional state,

[0920] A device that generates personalized support based on emotional state,

[0921] A device that notifies the user of the generated support on their information device,

[0922] A device that collects user feedback and uses it to improve the system,

[0923] A device that monitors the psychological health status of elderly people and suggests appropriate relaxation or social interaction,

[0924] A system that includes this.

[0925] (Claim 2)

[0926] The system according to claim 1, comprising a device for processing acquired audio data and analyzing linguistic expression and tone of voice.

[0927] (Claim 3)

[0928] The system according to claim 1, comprising a device that records the user's response to support and reflects it in future support generation using a learning algorithm. [Explanation of Symbols]

[0929] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. Means for acquiring user voice and biometric data, A means of analyzing acquired data in real time and estimating emotional states, A means of generating appropriate advice for the user based on their emotional state, In caregiving settings, a means of understanding emotional states and proposing individualized care and relaxation methods, A means for notifying the user of the generated advice on their information processing device, A means of collecting user feedback and using it to improve information processing technology, A system that includes this.

2. The system according to claim 1, comprising means for performing natural language processing on acquired audio data and analyzing language patterns and speech tones.

3. The system according to claim 1, further comprising means for recording the user's response to advice and reflecting it in future advice generation using a machine learning algorithm.