system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A system that analyzes voice and image data to generate personalized relaxation plans, addressing the challenge of stress management by providing tailored and adaptive stress relief methods.

JP2026096642APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

Application Information

Patent Timeline

03 Dec 2024

Application

15 Jun 2026

Publication

JP2026096642A

IPC: G16H20/70; G06Q10/00; G06F21/62; A61M21/02; G10L25/63

AI Tagging

Application Domain

Speech analysis Digital data protection

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Modern society faces challenges in effectively managing stress, particularly in accurately recognizing one's own emotions and taking appropriate actions, with existing systems struggling to provide personalized relaxation methods tailored to individual needs.

Method used

A system that analyzes voice and image data to identify emotional states, generates personalized relaxation plans, and provides instructions through a user terminal, while collecting feedback to improve the plans over time.

Benefits of technology

Enables efficient and individualized stress management by providing relaxation methods optimized for each user, adapting to their emotional state and improving over time based on user feedback.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096642000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] A program control means that analyzes audio and image data collected from the user to identify the emotional state, A program control means that generates a user-specific relaxation plan based on the aforementioned emotional state, A program control means that issues instructions to the user terminal in order to execute the relaxation plan, A means for collecting user feedback data and analyzing it to improve the relaxation plan, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern society, there is a problem that many people find it difficult to effectively manage stress. In particular, it is challenging to accurately recognize one's own emotions and take appropriate actions accordingly in a busy daily life. To solve this problem, there is a need for a system that allows users to understand their own stress levels and effectively utilize relaxation methods accordingly.

Means for Solving the Problems

[0005] This invention provides a means for identifying an emotional state by analyzing voice and image data from a user. It then constructs a program control means for generating a personalized relaxation plan based on that emotional state. Furthermore, it includes means for providing instructions to the user terminal to implement the generated relaxation plan. Additionally, it includes means for collecting user feedback data and analyzing it to improve the relaxation plan. This realizes a system that supports stress management and self-care tailored to each individual user.

[0006] A "user" is an individual who uses the system to manage stress and care for their emotions.

[0007] "Audio data" refers to digital information that includes the tone of the user's voice and the content of their speech.

[0008] "Image data" refers to visual information that captures the user's facial expressions and movements.

[0009] "Emotional state" is an indicator of the user's mental state, determined from analyzed audio and image data.

[0010] A "relaxation plan" is a program that includes stress reduction methods optimized for each individual user.

[0011] "Program control means" refers to a series of programs and their execution environment that enable the system to process various data and provide appropriate instructions to the user.

[0012] "Feedback data" refers to information provided by users regarding the effectiveness of relaxation plans, and is used to improve those plans. [Brief explanation of the drawing]

[0013] [Figure 1]This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0014] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0019] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] This invention provides a personal AI system to improve users' stress management. The system collects the user's voice and image data in real time, and a server analyzes this data to identify the user's emotional state. Based on this data, the server automatically generates an individualized relaxation plan. The relaxation plan includes methods best suited to the user's condition, such as playing music, guiding meditation, or suggesting short exercises. For example, a user determined to have a high stress level may be suggested to play a relaxing audio file from their device.

[0035] The device executes a relaxation plan received from the server and directly prompts the user for action. For example, the device can play relaxation music through its speakers and display meditation guides on its screen. This allows users to easily create a relaxing environment and reduce stress in their daily lives.

[0036] Furthermore, this system has a feature that allows it to receive feedback from users and improve relaxation plans based on past usage history. For example, if a particular piece of music or program was effective for a user, the server will prioritize suggesting it again next time.

[0037] In this way, the system can adapt to the individual needs of each user and enable efficient emotional care. To implement this invention, it is recommended to also use appropriate privacy protection measures to securely handle the user's voice and image data.

[0038] The following describes the processing flow.

[0039] Step 1:

[0040] The user initiates voice input and image capture. The device collects the user's voice and facial expression data in real time and sends it to the server.

[0041] Step 2:

[0042] The server analyzes the received audio data, examining the tone and speed of the voice to evaluate the user's emotional state. It also analyzes image data to identify emotions from the user's facial expressions.

[0043] Step 3:

[0044] The server identifies the user's current emotional state based on the analysis results. This allows it to determine whether the user is experiencing stress.

[0045] Step 4:

[0046] The server generates a relaxation plan corresponding to the identified emotional state. The plan includes suggestions such as relaxation music, meditation guidance, and short exercise sessions.

[0047] Step 5:

[0048] The server generates a relaxation plan and sends it to the device.

[0049] Step 6:

[0050] The device executes the received relaxation plan. For example, it might play music through the speaker and display a meditation guide on the screen.

[0051] Step 7:

[0052] Users experience a relaxation plan and input feedback about its effects into their device.

[0053] Step 8:

[0054] The device collects user feedback data and sends it to the server.

[0055] Step 9:

[0056] The server analyzes feedback data and learns how to improve the relaxation plan. This makes future suggestions more effective.

[0057] (Example 1)

[0058] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0059] In modern society, users experience various stresses in their daily lives, and effective stress management is essential. However, providing appropriate relaxation methods tailored to individual needs immediately is difficult. Furthermore, it is necessary to perform highly accurate emotional analysis while protecting the privacy of collected data and automatically generate personalized plans. Developing an efficient and reliable system to address these challenges is crucial.

[0060] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0061] In this invention, the server includes information processing means for analyzing audio and image information collected from the user to identify the user's emotional state, information processing means for generating a personalized relaxation plan for the user, and communication means for securely transmitting the audio and image information. This enables the automatic provision of relaxation methods tailored to the user's individual emotional state, thereby reducing stress in daily life.

[0062] A "user" refers to an individual who uses the system and is a data provider of voice and image information.

[0063] "Voice information" refers to data obtained from a user's speech and voice, and is used to identify their emotional state.

[0064] "Image information" refers to visual data, including the user's facial expressions and actions, and is used for sentiment analysis.

[0065] "Information processing means" refers to functions that analyze received data to generate the user's emotional state and relaxation plan.

[0066] "Emotional state" is an indicator that shows the user's psychological state and stress level, and is identified through the analysis of audio and image information.

[0067] A "relaxation plan" refers to a proposal of personalized stress reduction measures that the system generates in response to one's emotional state.

[0068] "Communication means" refers to technologies or protocols for securely and efficiently transmitting voice and image information to a server.

[0069] "Feedback information" refers to data on user feedback and suggestions after using the system, which is used to improve relaxation plans.

[0070] This invention provides a system for effectively improving user stress management. The system's configuration and operation are described below.

[0071] First, the user accesses the system through a personal device. This device is equipped with a microphone to collect audio information and a camera to acquire image information. The device uses these devices to acquire both audio and image data in real time.

[0072] The acquired data is transmitted to a server using secure communication methods. The server then utilizes a generative AI model to analyze the audio and image information and identify the user's emotional state. This analysis employs speech recognition software and image processing algorithms. Identifying the emotional state allows for the assessment of the user's stress level and psychological condition.

[0073] The server automatically generates an individualized relaxation plan based on the identified emotional state. This plan is optimized based on the user's past history and feedback information. For example, a user identified as having a high stress level will be presented with specific relaxation music or meditation guidance. In this process, the AI model is provided with a prompt message stating, "Please select the most suitable music based on the user's stress level."

[0074] Next, the generated relaxation plan is sent to the device and executed. The device can play relaxation music through its speaker and display meditation guidance videos on its screen. This allows the user to immediately experience appropriate relaxation.

[0075] This system allows users to effectively manage stress in a individually customized way. User feedback is used to improve the system's accuracy, ensuring continuous performance enhancement.

[0076] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0077] Step 1:

[0078] The user accesses the system using a terminal. The terminal collects the user's voice information through the microphone and acquires image information using the camera. This information is necessary to evaluate the user's emotional state. Voice and image information are collected as input data and sent to the next processing step.

[0079] Step 2:

[0080] The device transmits collected audio and image information to the server using a secure communication method. This communication method is encrypted to protect user privacy. The transmitted data becomes input data necessary for analysis on the server.

[0081] Step 3:

[0082] The server uses a generative AI model to identify the user's emotional state based on the received audio and image information. In this process, a speech recognition algorithm analyzes the tone and speed of the voice, and an image processing algorithm analyzes facial expressions. The analysis results output the user's stress level and emotional state, preparing the server to proceed to the next step.

[0083] Step 4:

[0084] The server generates an individualized relaxation plan based on the identified emotional state. This plan generation also takes into account past user data and feedback. As a result, the optimal relaxation method for the user is determined. The generating AI model selects an action using the prompt message "Please select the optimal music based on the user's stress level."

[0085] Step 5:

[0086] The server sends the generated relaxation plan to the terminal. This transmission is also encrypted. The plan received by the terminal becomes a dataset ready for execution.

[0087] Step 6:

[0088] The device executes the received relaxation plan. Specifically, it starts playing relaxation music through the speaker and displays a meditation guide video on the screen. The output is provided to the user as music and video.

[0089] Step 7:

[0090] After receiving a relaxation experience through the device, the user enters feedback information. This feedback is used to improve the accuracy of the system. The entered feedback is sent to the server and stored in the database.

[0091] Step 8:

[0092] The server analyzes the collected feedback information to improve relaxation plans. This allows future plans to be more refined to better meet user needs. The analysis results are used as input data to provide personalized services in the future.

[0093] (Application Example 1)

[0094] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0095] In modern society, it is crucial to understand individual users' stress levels and emotional states in real time and provide appropriate relaxation methods accordingly. However, conventional systems have struggled to accurately analyze users' emotional states and provide individually optimized relaxation plans. Furthermore, they lack mechanisms for improving plans based on feedback, making it difficult to maintain user satisfaction in the long term.

[0096] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0097] In this invention, the server includes information processing means for analyzing voice and image data collected from the user to identify the user's emotional state, information processing means for generating a user-specific relaxation plan based on the emotional state, and control means for the robot to perform specific actions based on the analysis results. This enables effective relaxation tailored to the user's emotional state.

[0098] "Voice data" refers to voice information collected from users and is the subject of analysis for identifying their emotional state.

[0099] "Image data" refers to visual information that captures a user's facial expressions and movements, and is the subject of analysis for identifying their emotional state.

[0100] "Information processing means" refers to a device or program that has the function of analyzing audio and image data obtained from a user and performing processing to identify the user's emotional state.

[0101] A "relaxation plan" is a set of behavioral guidelines and schedules designed to provide users with stress reduction methods tailored to their specific emotional state.

[0102] "Control means" refers to a device or program that has the function of operating a device or system that takes specific actions to carry out a relaxation plan.

[0103] "Feedback data" refers to information about users' reactions and evaluations when they actually used a relaxation plan.

[0104] "Analysis means" refers to a system or program that uses collected feedback data to perform analysis aimed at improving future relaxation plans.

[0105] To implement this invention, the user first uses a consumer robot connected to a dedicated terminal. The robot is equipped with a high-performance camera and microphone, and this hardware is used to collect voice and image data from the user in real time.

[0106] The server converts the collected audio data into text data using speech recognition software such as Google® Speech-to-Text API, and analyzes the user's facial expressions from the image data using image analysis software such as OpenCV. The collected data is used to analyze the user's emotional state.

[0107] Once the user's emotional state is identified, the server uses a generation AI model to create a personalized relaxation plan based on that state. This plan may include playing relaxation music, suggesting guided meditation, and simple exercises.

[0108] The generated relaxation plan is executed by the robot's control system. For example, the robot plays relaxation music through a speaker and displays meditation guides on a screen. This allows the user to create a relaxing environment.

[0109] Furthermore, feedback data from users after they have used a relaxation plan is collected and sent to the server. This feedback data is evaluated using analytical tools and used to improve future relaxation plans. This allows for the provision of services that are more tailored to the individual needs of users.

[0110] As a concrete example, imagine a scenario where, while a user is relaxing in their living room, the robot says, "It looks like you need to relax today," and then plays healing music based on an optimized relaxation plan.

[0111] Examples of prompts for a generative AI model are as follows:

[0112] "Analyze the user's emotional state and suggest appropriate relaxation methods if they are experiencing high stress levels. Include specific actions such as listening to music, short guided meditations, or simple stretching exercises."

[0113] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0114] Step 1:

[0115] The user provides audio and image data to the device. The device uses its camera and microphone to collect the user's facial expressions and voice in real time. During this process, the device prepares to send the data to the server while anonymizing it.

[0116] Step 2:

[0117] The server receives the audio data sent from the terminal and uses speech recognition software (e.g., Google Speech-to-Text API) to convert the audio into text data. The converted text data becomes the input data for sentiment analysis.

[0118] Step 3:

[0119] The server receives image data and uses image analysis software (e.g., OpenCV) to analyze facial expressions. The analysis identifies the emotional state derived from the facial expression, and the obtained emotional information is used in the next step.

[0120] Step 4:

[0121] The server integrates text data obtained from voice and emotion data from image analysis, and uses a generative AI model to classify the user's emotional state. It then generates a relaxation plan based on the emotional state. The input consists of text and image data, and the output is a personalized relaxation plan.

[0122] Step 5:

[0123] The generated relaxation plan is sent from the server to the robot. The robot uses control mechanisms to initiate specific actions to execute the plan. For example, it might play relaxation music through a speaker and display meditation guidance on a screen.

[0124] Step 6:

[0125] Users experience a relaxation plan and provide feedback via a device or robot. This feedback includes satisfaction levels and areas for improvement.

[0126] Step 7:

[0127] The server receives and analyzes user feedback data. This feedback data is used to improve the next relaxation plan. This allows for the provision of services that are more tailored to the individual needs of the user.

[0128] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0129] This invention provides a personal AI system incorporating an emotion engine that recognizes the user's emotional state in real time. This system collects voice and image data, identifies the user's emotions based on this data, and provides an individualized relaxation plan. Specific embodiments are described below.

[0130] The device uses a microphone and camera to collect voice and image data from the user. The collected data is immediately transmitted to a server. The server uses an emotion engine to analyze voice tone, speaking style, and facial expression changes to recognize the user's emotions with high accuracy.

[0131] The emotion engine uses algorithms to analyze the tone and speed of speech, as well as points of change in facial expression, to understand emotional states such as "tension" or "relaxation." Based on this emotion recognition, the server generates an optimal relaxation plan for the user. This plan includes relaxation music, guided meditation, and exercise suggestions. For example, if the user is identified as "anxious," the system will prioritize suggesting calming music and relaxation guides.

[0132] The generated relaxation plan is immediately sent to the device, which then executes it. The user experiences this plan and can reap the stress-reducing effects. After the experience, the user enters feedback into the device, which is collected by the server. The server analyzes this feedback and optimizes the emotion engine's algorithm to improve the accuracy of future relaxation plans.

[0133] This system allows users to understand their emotional state in real time and take specific measures accordingly, enabling them to effectively manage daily stress. For privacy reasons, audio and image data are anonymized and processed securely.

[0134] The following describes the processing flow.

[0135] Step 1:

[0136] The user activates the system and prepares to collect their data through voice input and the camera. The device records this voice and image data in real time.

[0137] Step 2:

[0138] The device sends the collected audio and image data to the server. Data transmission is performed using a secure protocol.

[0139] Step 3:

[0140] The server uses an emotion engine to analyze the received audio data. It analyzes the tone, speed, and nuances of the speech to identify the emotional state.

[0141] Step 4:

[0142] The server analyzes the image data. It detects the user's facial features and analyzes changes in their expressions to more accurately understand their emotional state.

[0143] Step 5:

[0144] The server integrates the analysis results of audio and image data to recognize the user's overall emotional state.

[0145] Step 6:

[0146] The server generates a relaxation plan optimized for the user based on their perceived emotional state. For example, if the user is judged to be highly stressed, it will suggest a plan that includes calming music and meditation guidance.

[0147] Step 7:

[0148] The server generates a relaxation plan and sends it to the device.

[0149] Step 8:

[0150] The device executes the received plan. It plays relaxation music from the speaker and displays a meditation guide on the screen.

[0151] Step 9:

[0152] Users experience a relaxation plan and provide feedback on its effects via their device.

[0153] Step 10:

[0154] The device collects user feedback and sends it to the server.

[0155] Step 11:

[0156] The server analyzes the feedback it receives and uses it to improve the emotion engine and plan generation algorithm. This improves the accuracy of future suggestions.

[0157] (Example 2)

[0158] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0159] In modern society, many people experience stress and anxiety on a daily basis and are seeking effective ways to cope with them. However, conventional stress management methods have difficulty responding to individual situations and emotions, making it challenging to provide appropriate relief methods for each user. Furthermore, there have been limited systems that analyze users' emotions in real time and provide appropriate countermeasures while respecting privacy. This invention aims to solve these problems and provide a system that enables efficient and individualized stress management.

[0160] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0161] In this invention, the server includes information processing means for analyzing audio and video information to identify an emotional state, information processing means for generating a user-specific relaxation plan, and information processing means having a function to optimize the algorithm using feedback data based on the emotion analysis results. This makes it possible to provide an optimal relaxation plan tailored to the individual emotional state of the user, thereby simultaneously achieving effective stress management and privacy protection.

[0162] "Audio information" refers to data that includes acoustic signals collected from users and is used for sentiment analysis.

[0163] "Visual information" refers to data that represents visual signals, including the user's facial expressions and movements, and is used to identify their emotional state.

[0164] "Emotional state" refers to the emotional state of a user and is a concept that includes psychological and physiological responses such as "tension" and "relaxation."

[0165] "Information processing means" refers to methods and technologies for analyzing audio and video information to identify emotional states.

[0166] A "relaxation plan" refers to a combination of relaxation techniques that are individually tailored to the user's emotional state.

[0167] "Information terminal" refers to a device used to provide and implement mitigation plans to users, and includes smartphones and tablets.

[0168] "Opinion data" refers to feedback collected from users, which is useful information for improving mitigation plans.

[0169] "Privacy protection features" refer to technologies and methods for anonymizing and securely processing users' personal data.

[0170] A "generative artificial intelligence model" refers to a system that uses machine learning techniques to analyze a user's emotional state and generate appropriate responses and plans.

[0171] "Recommendation prompts" refer to guiding instructions or suggestions generated based on the user's emotional state, intended to assist in the implementation of the plan.

[0172] This invention is a personal AI system for identifying a user's emotional state in real time and providing a mitigation plan based on that state. The system has the following configuration:

[0173] Data collection and transmission

[0174] The device uses a microdevice to collect audio information from the user and a camera module to collect video information. This information is collected in the background without the user needing to operate the device and is sent to the server via a secure connection.

[0175] Analysis of emotional states

[0176] The server analyzes the received audio and video information using a generative AI model. It analyzes tone, intonation, and speed from the audio information, and changes in facial expressions from the video information, comprehensively identifying the user's emotional state. This analysis employs advanced machine learning algorithms to classify emotional states into multiple categories such as "tension" and "relaxation."

[0177] Generation and provision of mitigation plans

[0178] The server generates a relaxation plan tailored to the user based on the analysis results. This plan includes relaxing music selections, guided meditation, and simple exercise programs. For example, if the user's emotional state is determined to be "anxious," the server will recommend classical music with relaxing effects and an experiential deep breathing guide.

[0179] Feedback and algorithm optimization

[0180] Users can input feedback on the mitigation plan they have experienced on their device. This feedback information is sent to a server and analyzed to improve the accuracy of future mitigation plans. The server uses these analysis results to continuously optimize the generated AI model and enhance the personalized user experience.

[0181] For example, if a user is experiencing increased stress while working from home, the system will generate a plan to promote relaxation and provide a meditation guide with nature sounds as a background.

[0182] An example of a prompt sentence to input into the generating AI model is, "If a user feels anxious while working from home, what kind of relaxation plan should be suggested?"

[0183] This system allows users to receive specific stress relief methods tailored to their emotional state in real time, enabling effective stress management.

[0184] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0185] Step 1:

[0186] The device uses a microdevice to collect audio information from the user and a camera module to collect video information. This enables real-time data collection. The input is the user's voice and facial expressions, which are output as audio and video data. Specifically, the device runs a background process and continuously collects data with the user's permission.

[0187] Step 2:

[0188] The terminal transfers the collected audio and video data to the server via a secure protocol. This transfer process is carried out after anonymizing the data, thus protecting privacy. The input is the audio and video data obtained in step 1, and the output is the anonymized data transmitted to the server.

[0189] Step 3:

[0190] The server analyzes the received audio information and extracts features such as tone, speed, and intonation. A generative AI model is used to infer emotional states from the audio data. The input is the transmitted audio data, and the output is category information of the emotional state (e.g., "tense," "relaxed," etc.).

[0191] Step 4:

[0192] The server similarly analyzes the video information and detects changes in facial expressions. This provides clues to the user's physical emotions. The generative AI model comprehensively determines the emotional state by combining this with audio information. The input is video data, and the output is the analyzed facial expression changes and the resulting emotional information.

[0193] Step 5:

[0194] The server integrates the results of audio and video analysis to generate an optimal relaxation plan. This plan may include music tailored to the user's state, guided meditation, and exercise programs. The input is integrated emotional information, and the output is the details of the relaxation plan.

[0195] Step 6:

[0196] The generated mitigation plan is sent from the server to the terminal, which then uses this information to make suggestions to the user. The terminal automatically performs actions according to the plan, such as playing music or providing meditation guidance. The input is the information from the mitigation plan, and the output is the actions to be taken by the user.

[0197] Step 7:

[0198] Users input feedback on the effectiveness of the provided mitigation plan into a terminal. This feedback data is sent to a server and used to optimize future algorithms. The input is user feedback, and the output is improvement information recorded in the database.

[0199] Step 8:

[0200] The server analyzes the collected feedback and optimizes the generated AI model to improve the accuracy of the next proposal. The input is the feedback data, and the output is the improved algorithm and the improved accuracy of the next mitigation plan.

[0201] (Application Example 2)

[0202] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0203] In elderly care settings, maintaining the mental health of the elderly requires accurately recognizing each individual's emotional state and providing appropriate relaxation plans based on that understanding. However, current systems struggle with real-time emotional recognition, resulting in insufficient individualized care.

[0204] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0205] In this invention, the server includes information processing means for analyzing acoustic and video data to identify emotional states, information processing means for generating relaxation plans based on emotional states, and information processing means for providing plans aimed at the mental care of elderly people. This makes it possible to provide relaxation plans optimized for each elderly person and support the maintenance of their mental health.

[0206] "Audio data" refers to digital data containing information about sound, and is composed of sound waveforms, frequencies, volume, and other elements.

[0207] "Video data" refers to digital data obtained from image information acquired by cameras, etc., and includes the shape, color, and movement of objects.

[0208] "Information processing means" refers to programs or devices that analyze specific input data and extract or calculate necessary information.

[0209] "Emotional state" refers to an individual's inner mental state, and includes psychological states such as joy and sadness.

[0210] A "relaxation plan" is a set of activities and instructions designed to promote mental and physical relaxation in an individual.

[0211] "Response data" refers to data that digitally records user feedback and behavioral results after using a system.

[0212] "Personalization accuracy" refers to the degree of ability to provide services and content that match the characteristics and circumstances of the user.

[0213] To implement this invention, it is necessary to analyze audio and video data collected from users to identify each user's individual emotional state. The server collects data using the smartphone's microphone and camera and transmits it immediately. Specifically, a machine learning model is implemented using Python and TENSORFLOW® to process the data. This makes it possible to recognize the user's emotional state with high accuracy from changes in voice tone, speaking style, and facial expressions.

[0214] Based on the emotion recognition results, the server generates a relaxation plan. This plan includes relaxation music, meditation guides, and exercise suggestions for seniors. For example, if the user is identified as anxious, it will provide calming classical music and deep breathing guidance.

[0215] The device aims to stabilize the user's mental state by implementing the generated relaxation plan. The response data obtained from the user is analyzed on the server and used as data to improve the personalization accuracy of the relaxation plan.

[0216] As a concrete example, here is an example of a prompt message that performs emotion recognition:

[0217] "The feedback from users seems urgent. Based on the situation, what kind of music would be suitable for relaxation?"

[0218] "The user's facial expression seems cloudy. Please suggest a meditation guide that would be suitable for them."

[0219] This system makes it possible to provide precise and appropriate relaxation plans to maintain the mental health of each elderly person in care settings.

[0220] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0221] Step 1:

[0222] The device acquires the user's audio and video data. Specifically, it records audio using the smartphone's microphone and captures facial expressions with its camera. The input is real-time audio and video, and the output is digital audio and video data. This data is transmitted to the server immediately.

[0223] Step 2:

[0224] The server analyzes the received audio and video data. It uses Python and TensorFlow to run a machine learning model, extracting changes in voice tone and facial expressions from the data. The input consists of audio and video data, and the output is a label indicating an emotional state, such as "tension" or "relaxation."

[0225] Step 3:

[0226] The server generates a relaxation plan based on the emotion recognition results. It queries the generating AI model using prompts to select appropriate relaxation methods. The input is an emotional state label, and the output is a relaxation plan (e.g., quiet classical music or deep breathing guidance).

[0227] Step 4:

[0228] The terminal receives a relaxation plan generated from the server and executes the plan for the user. The user performs activities to relax according to the guide. The input is the relaxation plan, and the output is feedback on the user's actions.

[0229] Step 5:

[0230] Users input feedback into a device after their relaxation experience. Specifically, they input data by evaluating their satisfaction level and the effects they felt. The input is the user's feedback, and the output is feedback data.

[0231] Step 6:

[0232] The server analyzes feedback data collected from users to improve the accuracy of relaxation plans. This data is used as training data for a machine learning model, serving as a reference for future plan generation. The input is feedback data, and the output is an improved relaxation plan proposal.

[0233] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0234] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0235] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0236] [Second Embodiment]

[0237] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0238] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0239] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0240] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0241] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0242] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0243] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0244] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0245] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0246] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0247] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0248] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0249] This invention provides a personal AI system to improve users' stress management. The system collects the user's voice and image data in real time, and a server analyzes this data to identify the user's emotional state. Based on this data, the server automatically generates an individualized relaxation plan. The relaxation plan includes methods best suited to the user's condition, such as playing music, guiding meditation, or suggesting short exercises. For example, a user determined to have a high stress level may be suggested to play a relaxing audio file from their device.

[0250] The device executes a relaxation plan received from the server and directly prompts the user for action. For example, the device can play relaxation music through its speakers and display meditation guides on its screen. This allows users to easily create a relaxing environment and reduce stress in their daily lives.

[0251] Furthermore, this system has a feature that allows it to receive feedback from users and improve relaxation plans based on past usage history. For example, if a particular piece of music or program was effective for a user, the server will prioritize suggesting it again next time.

[0252] In this way, the system can adapt to the individual needs of each user and enable efficient emotional care. To implement this invention, it is recommended to also use appropriate privacy protection measures to securely handle the user's voice and image data.

[0253] The following describes the processing flow.

[0254] Step 1:

[0255] The user initiates voice input and image capture. The device collects the user's voice and facial expression data in real time and sends it to the server.

[0256] Step 2:

[0257] The server analyzes the received audio data, examining the tone and speed of the voice to evaluate the user's emotional state. It also analyzes image data to identify emotions from the user's facial expressions.

[0258] Step 3:

[0259] The server identifies the user's current emotional state based on the analysis results. This allows it to determine whether the user is experiencing stress.

[0260] Step 4:

[0261] The server generates a relaxation plan corresponding to the identified emotional state. The plan includes suggestions such as relaxation music, meditation guidance, and short exercise sessions.

[0262] Step 5:

[0263] The server generates a relaxation plan and sends it to the device.

[0264] Step 6:

[0265] The device executes the received relaxation plan. For example, it might play music through the speaker and display a meditation guide on the screen.

[0266] Step 7:

[0267] Users experience a relaxation plan and input feedback about its effects into their device.

[0268] Step 8:

[0269] The device collects user feedback data and sends it to the server.

[0270] Step 9:

[0271] The server analyzes feedback data and learns how to improve the relaxation plan. This makes future suggestions more effective.

[0272] (Example 1)

[0273] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0274] In modern society, users experience various stresses in their daily lives, and effective stress management is essential. However, providing appropriate relaxation methods tailored to individual needs immediately is difficult. Furthermore, it is necessary to perform highly accurate emotional analysis while protecting the privacy of collected data and automatically generate personalized plans. Developing an efficient and reliable system to address these challenges is crucial.

[0275] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0276] In this invention, the server includes information processing means for analyzing audio and image information collected from the user to identify the user's emotional state, information processing means for generating a personalized relaxation plan for the user, and communication means for securely transmitting the audio and image information. This enables the automatic provision of relaxation methods tailored to the user's individual emotional state, thereby reducing stress in daily life.

[0277] A "user" refers to an individual who uses the system and is a data provider of voice and image information.

[0278] "Voice information" refers to data obtained from a user's speech and voice, and is used to identify their emotional state.

[0279] "Image information" refers to visual data, including the user's facial expressions and actions, and is used for sentiment analysis.

[0280] "Information processing means" refers to functions that analyze received data to generate the user's emotional state and relaxation plan.

[0281] "Emotional state" is an indicator that shows the user's psychological state and stress level, and is identified through the analysis of audio and image information.

[0282] The "Relaxation Plan" refers to the proposal of individualized stress reduction measures generated by the system according to the emotional state.

[0283] The "Communication Means" refers to the technology or protocol for securely and efficiently transmitting voice information and image information to the server.

[0284] The "Feedback Information" refers to the data of the user's feelings and suggestions after using the system, which is utilized for the improvement of the relaxation plan.

[0285] This invention provides a system for effectively improving the user's stress management. The configuration and operation of the system will be described below.

[0286] First, the user accesses the system through a personal terminal. This terminal is equipped with a microphone for collecting voice information and a camera for acquiring image information. The terminal uses these devices to obtain both voice and image data in real time.

[0287] The acquired data is transmitted to the server using secure communication means. Here, the server utilizes a generative AI model to analyze the voice information and image information and identify the user's emotional state. Voice recognition software and image processing algorithms are used for this analysis. By identifying the emotional state, the user's stress level and psychological state can be evaluated.

[0288] The server automatically generates an individual relaxation plan based on the identified emotional state. This plan is optimized based on the user's past history information and feedback information. For example, for a user determined to have a high stress level, specific relaxation music or meditation guides are selected. In this process, an instruction such as "Please select the optimal music based on the user's stress level" is provided to the generative AI model as a prompt sentence.

[0289] Next, the generated relaxation plan is sent to the device and executed. The device can play relaxation music through its speaker and display meditation guidance videos on its screen. This allows the user to immediately experience appropriate relaxation.

[0290] This system allows users to effectively manage stress in a individually customized way. User feedback is used to improve the system's accuracy, ensuring continuous performance enhancement.

[0291] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0292] Step 1:

[0293] The user accesses the system using a terminal. The terminal collects the user's voice information through the microphone and acquires image information using the camera. This information is necessary to evaluate the user's emotional state. Voice and image information are collected as input data and sent to the next processing step.

[0294] Step 2:

[0295] The device transmits collected audio and image information to the server using a secure communication method. This communication method is encrypted to protect user privacy. The transmitted data becomes input data necessary for analysis on the server.

[0296] Step 3:

[0297] The server uses a generative AI model to identify the user's emotional state based on the received audio and image information. In this process, a speech recognition algorithm analyzes the tone and speed of the voice, and an image processing algorithm analyzes facial expressions. The analysis results output the user's stress level and emotional state, preparing the server to proceed to the next step.

[0298] Step 4:

[0299] The server generates an individualized relaxation plan based on the identified emotional state. This plan generation also takes into account past user data and feedback. As a result, the optimal relaxation method for the user is determined. The generating AI model selects an action using the prompt message "Please select the optimal music based on the user's stress level."

[0300] Step 5:

[0301] The server sends the generated relaxation plan to the terminal. This transmission is also encrypted. The plan received by the terminal becomes a dataset ready for execution.

[0302] Step 6:

[0303] The device executes the received relaxation plan. Specifically, it starts playing relaxation music through the speaker and displays a meditation guide video on the screen. The output is provided to the user as music and video.

[0304] Step 7:

[0305] After receiving a relaxation experience through the device, the user enters feedback information. This feedback is used to improve the accuracy of the system. The entered feedback is sent to the server and stored in the database.

[0306] Step 8:

[0307] The server analyzes the collected feedback information and uses it to improve the relaxation plan. As a result, subsequent plans will be refined to better meet user needs. The results of the analysis are used as input data for providing future personalized services.

[0308] (Application Example 1)

[0309] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0310] In modern society, it is very important to grasp the stress and emotional state of individual users in real time and provide appropriate relaxation means accordingly. However, in conventional systems, it has been difficult to accurately analyze the emotional state of users and provide an individually optimized relaxation plan. Furthermore, there is a lack of a mechanism to improve the plan based on feedback, and there is a problem that it is difficult to maintain user satisfaction in the long term.

[0311] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0312] In this invention, the server includes information processing means for analyzing voice data and image data collected from a user to identify an emotional state, information processing means for generating a relaxation plan for an individual user based on the emotional state, and control means for controlling a robot to execute a specific action based on the analysis result. As a result, effective relaxation according to the emotional state of the user becomes possible.

[0313] "Voice data" refers to voice information collected from a user and is an analysis target for identifying an emotional state.

[0314] "Image data" is visual information capturing the user's expression and actions and is an analysis target for identifying an emotional state.

[0315] "Information processing means" refers to a device or program that has the function of analyzing audio and image data obtained from a user and performing processing to identify the user's emotional state.

[0316] A "relaxation plan" is a set of behavioral guidelines and schedules designed to provide users with stress reduction methods tailored to their specific emotional state.

[0317] "Control means" refers to a device or program that has the function of operating a device or system that takes specific actions to carry out a relaxation plan.

[0318] "Feedback data" refers to information about users' reactions and evaluations when they actually used a relaxation plan.

[0319] "Analysis means" refers to a system or program that uses collected feedback data to perform analysis aimed at improving future relaxation plans.

[0320] To implement this invention, the user first uses a consumer robot connected to a dedicated terminal. The robot is equipped with a high-performance camera and microphone, and this hardware is used to collect voice and image data from the user in real time.

[0321] The server converts the collected audio data into text data using speech recognition software such as the Google Speech-to-Text API, and analyzes the user's facial expressions from the image data using image analysis software such as OpenCV. The collected data is used to analyze the user's emotional state.

[0322] Once the user's emotional state is identified, the server uses a generation AI model to create a personalized relaxation plan based on that state. This plan may include playing relaxation music, suggesting guided meditation, and simple exercises.

[0323] The generated relaxation plan is executed by the robot's control system. For example, the robot plays relaxation music through a speaker and displays meditation guides on a screen. This allows the user to create a relaxing environment.

[0324] Furthermore, feedback data from users after they have used a relaxation plan is collected and sent to the server. This feedback data is evaluated using analytical tools and used to improve future relaxation plans. This allows for the provision of services that are more tailored to the individual needs of users.

[0325] As a concrete example, imagine a scenario where, while a user is relaxing in their living room, the robot says, "It looks like you need to relax today," and then plays healing music based on an optimized relaxation plan.

[0326] Examples of prompts for a generative AI model are as follows:

[0327] "Analyze the user's emotional state and suggest appropriate relaxation methods if they are experiencing high stress levels. Include specific actions such as listening to music, short guided meditations, or simple stretching exercises."

[0328] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0329] Step 1:

[0330] The user provides audio and image data to the device. The device uses its camera and microphone to collect the user's facial expressions and voice in real time. During this process, the device prepares to send the data to the server while anonymizing it.

[0331] Step 2:

[0332] The server receives the audio data sent from the terminal and uses speech recognition software (e.g., Google Speech-to-Text API) to convert the audio into text data. The converted text data becomes the input data for sentiment analysis.

[0333] Step 3:

[0334] The server receives image data and uses image analysis software (e.g., OpenCV) to analyze facial expressions. The analysis identifies the emotional state derived from the facial expression, and the obtained emotional information is used in the next step.

[0335] Step 4:

[0336] The server integrates text data obtained from voice and emotion data from image analysis, and uses a generative AI model to classify the user's emotional state. It then generates a relaxation plan based on the emotional state. The input consists of text and image data, and the output is a personalized relaxation plan.

[0337] Step 5:

[0338] The generated relaxation plan is sent from the server to the robot. The robot uses control mechanisms to initiate specific actions to execute the plan. For example, it might play relaxation music through a speaker and display meditation guidance on a screen.

[0339] Step 6:

[0340] Users experience a relaxation plan and provide feedback via a device or robot. This feedback includes satisfaction levels and areas for improvement.

[0341] Step 7:

[0342] The server receives and analyzes user feedback data. This feedback data is used to improve the next relaxation plan. This allows for the provision of services that are more tailored to the individual needs of the user.

[0343] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0344] This invention provides a personal AI system incorporating an emotion engine that recognizes the user's emotional state in real time. This system collects voice and image data, identifies the user's emotions based on this data, and provides an individualized relaxation plan. Specific embodiments are described below.

[0345] The device uses a microphone and camera to collect voice and image data from the user. The collected data is immediately transmitted to a server. The server uses an emotion engine to analyze voice tone, speaking style, and facial expression changes to recognize the user's emotions with high accuracy.

[0346] The emotion engine uses algorithms to analyze the tone and speed of speech, as well as points of change in facial expression, to understand emotional states such as "tension" or "relaxation." Based on this emotion recognition, the server generates an optimal relaxation plan for the user. This plan includes relaxation music, guided meditation, and exercise suggestions. For example, if the user is identified as "anxious," the system will prioritize suggesting calming music and relaxation guides.

[0347] The generated relaxation plan is immediately sent to the device, which then executes it. The user experiences this plan and can reap the stress-reducing effects. After the experience, the user enters feedback into the device, which is collected by the server. The server analyzes this feedback and optimizes the emotion engine's algorithm to improve the accuracy of future relaxation plans.

[0348] This system allows users to understand their emotional state in real time and take specific measures accordingly, enabling them to effectively manage daily stress. For privacy reasons, audio and image data are anonymized and processed securely.

[0349] The following describes the processing flow.

[0350] Step 1:

[0351] The user activates the system and prepares to collect their data through voice input and the camera. The device records this voice and image data in real time.

[0352] Step 2:

[0353] The device sends the collected audio and image data to the server. Data transmission is performed using a secure protocol.

[0354] Step 3:

[0355] The server uses an emotion engine to analyze the received audio data. It analyzes the tone, speed, and nuances of the speech to identify the emotional state.

[0356] Step 4:

[0357] The server analyzes the image data. It detects the user's facial features and analyzes changes in their expressions to more accurately understand their emotional state.

[0358] Step 5:

[0359] The server integrates the analysis results of audio and image data to recognize the user's overall emotional state.

[0360] Step 6:

[0361] The server generates a relaxation plan optimized for the user based on their perceived emotional state. For example, if the user is judged to be highly stressed, it will suggest a plan that includes calming music and meditation guidance.

[0362] Step 7:

[0363] The server generates a relaxation plan and sends it to the device.

[0364] Step 8:

[0365] The device executes the received plan. It plays relaxation music from the speaker and displays a meditation guide on the screen.

[0366] Step 9:

[0367] Users experience a relaxation plan and provide feedback on its effects via their device.

[0368] Step 10:

[0369] The device collects user feedback and sends it to the server.

[0370] Step 11:

[0371] The server analyzes the feedback it receives and uses it to improve the emotion engine and plan generation algorithm. This improves the accuracy of future suggestions.

[0372] (Example 2)

[0373] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0374] In modern society, many people experience stress and anxiety on a daily basis and are seeking effective ways to cope with them. However, conventional stress management methods have difficulty responding to individual situations and emotions, making it challenging to provide appropriate relief methods for each user. Furthermore, there have been limited systems that analyze users' emotions in real time and provide appropriate countermeasures while respecting privacy. This invention aims to solve these problems and provide a system that enables efficient and individualized stress management.

[0375] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0376] In this invention, the server includes information processing means for analyzing audio and video information to identify an emotional state, information processing means for generating a user-specific relaxation plan, and information processing means having a function to optimize the algorithm using feedback data based on the emotion analysis results. This makes it possible to provide an optimal relaxation plan tailored to the individual emotional state of the user, thereby simultaneously achieving effective stress management and privacy protection.

[0377] "Audio information" refers to data that includes acoustic signals collected from users and is used for sentiment analysis.

[0378] "Visual information" refers to data that represents visual signals, including the user's facial expressions and movements, and is used to identify their emotional state.

[0379] "Emotional state" refers to the emotional state of a user and is a concept that includes psychological and physiological responses such as "tension" and "relaxation."

[0380] "Information processing means" refers to methods and technologies for analyzing audio and video information to identify emotional states.

[0381] A "relaxation plan" refers to a combination of relaxation techniques that are individually tailored to the user's emotional state.

[0382] "Information terminal" refers to a device used to provide and implement mitigation plans to users, and includes smartphones and tablets.

[0383] "Opinion data" refers to feedback collected from users, which is useful information for improving mitigation plans.

[0384] "Privacy protection features" refer to technologies and methods for anonymizing and securely processing users' personal data.

[0385] A "generative artificial intelligence model" refers to a system that uses machine learning techniques to analyze a user's emotional state and generate appropriate responses and plans.

[0386] "Recommendation prompts" refer to guiding instructions or suggestions generated based on the user's emotional state, intended to assist in the implementation of the plan.

[0387] This invention is a personal AI system for identifying a user's emotional state in real time and providing a mitigation plan based on that state. The system has the following configuration:

[0388] Data collection and transmission

[0389] The device uses a microdevice to collect audio information from the user and a camera module to collect video information. This information is collected in the background without the user needing to operate the device and is sent to the server via a secure connection.

[0390] Analysis of emotional states

[0391] The server analyzes the received audio and video information using a generative AI model. It analyzes tone, intonation, and speed from the audio information, and changes in facial expressions from the video information, comprehensively identifying the user's emotional state. This analysis employs advanced machine learning algorithms to classify emotional states into multiple categories such as "tension" and "relaxation."

[0392] Generation and provision of mitigation plans

[0393] The server generates a relaxation plan tailored to the user based on the analysis results. This plan includes relaxing music selections, guided meditation, and simple exercise programs. For example, if the user's emotional state is determined to be "anxious," the server will recommend classical music with relaxing effects and an experiential deep breathing guide.

[0394] Feedback and algorithm optimization

[0395] Users can input feedback on the mitigation plan they have experienced on their device. This feedback information is sent to a server and analyzed to improve the accuracy of future mitigation plans. The server uses these analysis results to continuously optimize the generated AI model and enhance the personalized user experience.

[0396] For example, if a user is experiencing increased stress while working from home, the system will generate a plan to promote relaxation and provide a meditation guide with nature sounds as a background.

[0397] An example of a prompt sentence to input into the generating AI model is, "If a user feels anxious while working from home, what kind of relaxation plan should be suggested?"

[0398] This system allows users to receive specific stress relief methods tailored to their emotional state in real time, enabling effective stress management.

[0399] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0400] Step 1:

[0401] The device uses a microdevice to collect audio information from the user and a camera module to collect video information. This enables real-time data collection. The input is the user's voice and facial expressions, which are output as audio and video data. Specifically, the device runs a background process and continuously collects data with the user's permission.

[0402] Step 2:

[0403] The terminal transfers the collected audio and video data to the server via a secure protocol. This transfer process is carried out after anonymizing the data, thus protecting privacy. The input is the audio and video data obtained in step 1, and the output is the anonymized data transmitted to the server.

[0404] Step 3:

[0405] The server analyzes the received audio information and extracts features such as tone, speed, and intonation. A generative AI model is used to infer emotional states from the audio data. The input is the transmitted audio data, and the output is category information of the emotional state (e.g., "tense," "relaxed," etc.).

[0406] Step 4:

[0407] The server similarly analyzes the video information and detects changes in facial expressions. This provides clues to the user's physical emotions. The generative AI model comprehensively determines the emotional state by combining this with audio information. The input is video data, and the output is the analyzed facial expression changes and the resulting emotional information.

[0408] Step 5:

[0409] The server integrates the results of audio and video analysis to generate an optimal relaxation plan. This plan may include music tailored to the user's state, guided meditation, and exercise programs. The input is integrated emotional information, and the output is the details of the relaxation plan.

[0410] Step 6:

[0411] The generated mitigation plan is sent from the server to the terminal, which then uses this information to make suggestions to the user. The terminal automatically performs actions according to the plan, such as playing music or providing meditation guidance. The input is the information from the mitigation plan, and the output is the actions to be taken by the user.

[0412] Step 7:

[0413] Users input feedback on the effectiveness of the provided mitigation plan into a terminal. This feedback data is sent to a server and used to optimize future algorithms. The input is user feedback, and the output is improvement information recorded in the database.

[0414] Step 8:

[0415] The server analyzes the collected feedback and optimizes the generated AI model to improve the accuracy of the next proposal. The input is the feedback data, and the output is the improved algorithm and the improved accuracy of the next mitigation plan.

[0416] (Application Example 2)

[0417] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0418] In elderly care settings, maintaining the mental health of the elderly requires accurately recognizing each individual's emotional state and providing appropriate relaxation plans based on that understanding. However, current systems struggle with real-time emotional recognition, resulting in insufficient individualized care.

[0419] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0420] In this invention, the server includes information processing means for analyzing acoustic and video data to identify emotional states, information processing means for generating relaxation plans based on emotional states, and information processing means for providing plans aimed at the mental care of elderly people. This makes it possible to provide relaxation plans optimized for each elderly person and support the maintenance of their mental health.

[0421] "Audio data" refers to digital data containing information about sound, and is composed of sound waveforms, frequencies, volume, and other elements.

[0422] "Video data" refers to digital data obtained from image information acquired by cameras, etc., and includes the shape, color, and movement of objects.

[0423] "Information processing means" refers to programs or devices that analyze specific input data and extract or calculate necessary information.

[0424] "Emotional state" refers to an individual's inner mental state, and includes psychological states such as joy and sadness.

[0425] A "relaxation plan" is a set of activities and instructions designed to promote mental and physical relaxation in an individual.

[0426] "Response data" refers to data that digitally records user feedback and behavioral results after using a system.

[0427] "Personalization accuracy" refers to the degree of ability to provide services and content that match the characteristics and circumstances of the user.

[0428] To implement this invention, it is necessary to analyze audio and video data collected from users to identify each user's individual emotional state. The server collects data using the smartphone's microphone and camera and transmits it immediately. Specifically, a machine learning model is implemented using Python and TensorFlow to process the data. This makes it possible to recognize the user's emotional state with high accuracy from changes in voice tone, speaking style, and facial expressions.

[0429] Based on the emotion recognition results, the server generates a relaxation plan. This plan includes relaxation music, meditation guides, and exercise suggestions for seniors. For example, if the user is identified as anxious, it will provide calming classical music and deep breathing guidance.

[0430] The device aims to stabilize the user's mental state by implementing the generated relaxation plan. The response data obtained from the user is analyzed on the server and used as data to improve the personalization accuracy of the relaxation plan.

[0431] As a concrete example, here is an example of a prompt message that performs emotion recognition:

[0432] "The feedback from users seems urgent. Based on the situation, what kind of music would be suitable for relaxation?"

[0433] "The user's facial expression seems cloudy. Please suggest a meditation guide that would be suitable for them."

[0434] This system makes it possible to provide precise and appropriate relaxation plans to maintain the mental health of each elderly person in care settings.

[0435] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0436] Step 1:

[0437] The device acquires the user's audio and video data. Specifically, it records audio using the smartphone's microphone and captures facial expressions with its camera. The input is real-time audio and video, and the output is digital audio and video data. This data is transmitted to the server immediately.

[0438] Step 2:

[0439] The server analyzes the received audio and video data. It uses Python and TensorFlow to run a machine learning model, extracting changes in voice tone and facial expressions from the data. The input consists of audio and video data, and the output is a label indicating an emotional state, such as "tension" or "relaxation."

[0440] Step 3:

[0441] The server generates a relaxation plan based on the emotion recognition results. It queries the generating AI model using prompts to select appropriate relaxation methods. The input is an emotional state label, and the output is a relaxation plan (e.g., quiet classical music or deep breathing guidance).

[0442] Step 4:

[0443] The terminal receives a relaxation plan generated from the server and executes the plan for the user. The user performs activities to relax according to the guide. The input is the relaxation plan, and the output is feedback on the user's actions.

[0444] Step 5:

[0445] Users input feedback into a device after their relaxation experience. Specifically, they input data by evaluating their satisfaction level and the effects they felt. The input is the user's feedback, and the output is feedback data.

[0446] Step 6:

[0447] The server analyzes feedback data collected from users to improve the accuracy of relaxation plans. This data is used as training data for a machine learning model, serving as a reference for future plan generation. The input is feedback data, and the output is an improved relaxation plan proposal.

[0448] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0449] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0450] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0451] [Third Embodiment]

[0452] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0453] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0454] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0455] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0456] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0457] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0458] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0459] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0460] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0461] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0462] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0463] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0464] This invention provides a personal AI system to improve users' stress management. The system collects the user's voice and image data in real time, and a server analyzes this data to identify the user's emotional state. Based on this data, the server automatically generates an individualized relaxation plan. The relaxation plan includes methods best suited to the user's condition, such as playing music, guiding meditation, or suggesting short exercises. For example, a user determined to have a high stress level may be suggested to play a relaxing audio file from their device.

[0465] The device executes a relaxation plan received from the server and directly prompts the user for action. For example, the device can play relaxation music through its speakers and display meditation guides on its screen. This allows users to easily create a relaxing environment and reduce stress in their daily lives.

[0466] Furthermore, this system has a feature that allows it to receive feedback from users and improve relaxation plans based on past usage history. For example, if a particular piece of music or program was effective for a user, the server will prioritize suggesting it again next time.

[0467] In this way, the system can adapt to the individual needs of each user and enable efficient emotional care. To implement this invention, it is recommended to also use appropriate privacy protection measures to securely handle the user's voice and image data.

[0468] The following describes the processing flow.

[0469] Step 1:

[0470] The user initiates voice input and image capture. The device collects the user's voice and facial expression data in real time and sends it to the server.

[0471] Step 2:

[0472] The server analyzes the received audio data, examining the tone and speed of the voice to evaluate the user's emotional state. It also analyzes image data to identify emotions from the user's facial expressions.

[0473] Step 3:

[0474] The server identifies the user's current emotional state based on the analysis results. This allows it to determine whether the user is experiencing stress.

[0475] Step 4:

[0476] The server generates a relaxation plan corresponding to the identified emotional state. The plan includes suggestions such as relaxation music, meditation guidance, and short exercise sessions.

[0477] Step 5:

[0478] The server generates a relaxation plan and sends it to the device.

[0479] Step 6:

[0480] The device executes the received relaxation plan. For example, it might play music through the speaker and display a meditation guide on the screen.

[0481] Step 7:

[0482] Users experience a relaxation plan and input feedback about its effects into their device.

[0483] Step 8:

[0484] The device collects user feedback data and sends it to the server.

[0485] Step 9:

[0486] The server analyzes feedback data and learns how to improve the relaxation plan. This makes future suggestions more effective.

[0487] (Example 1)

[0488] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0489] In modern society, users experience various stresses in their daily lives, and effective stress management is essential. However, providing appropriate relaxation methods tailored to individual needs immediately is difficult. Furthermore, it is necessary to perform highly accurate emotional analysis while protecting the privacy of collected data and automatically generate personalized plans. Developing an efficient and reliable system to address these challenges is crucial.

[0490] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0491] In this invention, the server includes information processing means for analyzing audio and image information collected from the user to identify the user's emotional state, information processing means for generating a personalized relaxation plan for the user, and communication means for securely transmitting the audio and image information. This enables the automatic provision of relaxation methods tailored to the user's individual emotional state, thereby reducing stress in daily life.

[0492] A "user" refers to an individual who uses the system and is a data provider of voice and image information.

[0493] "Voice information" refers to data obtained from a user's speech and voice, and is used to identify their emotional state.

[0494] "Image information" refers to visual data, including the user's facial expressions and actions, and is used for sentiment analysis.

[0495] "Information processing means" refers to functions that analyze received data to generate the user's emotional state and relaxation plan.

[0496] "Emotional state" is an indicator that shows the user's psychological state and stress level, and is identified through the analysis of audio and image information.

[0497] A "relaxation plan" refers to a proposal of personalized stress reduction measures that the system generates in response to one's emotional state.

[0498] "Communication means" refers to technologies or protocols for securely and efficiently transmitting voice and image information to a server.

[0499] "Feedback information" refers to data on user feedback and suggestions after using the system, which is used to improve relaxation plans.

[0500] This invention provides a system for effectively improving user stress management. The system's configuration and operation are described below.

[0501] First, the user accesses the system through a personal device. This device is equipped with a microphone to collect audio information and a camera to acquire image information. The device uses these devices to acquire both audio and image data in real time.

[0502] The acquired data is transmitted to a server using secure communication methods. The server then utilizes a generative AI model to analyze the audio and image information and identify the user's emotional state. This analysis employs speech recognition software and image processing algorithms. Identifying the emotional state allows for the assessment of the user's stress level and psychological condition.

[0503] The server automatically generates an individualized relaxation plan based on the identified emotional state. This plan is optimized based on the user's past history and feedback information. For example, a user identified as having a high stress level will be presented with specific relaxation music or meditation guidance. In this process, the AI model is provided with a prompt message stating, "Please select the most suitable music based on the user's stress level."

[0504] Next, the generated relaxation plan is sent to the device and executed. The device can play relaxation music through its speaker and display meditation guidance videos on its screen. This allows the user to immediately experience appropriate relaxation.

[0505] This system allows users to effectively manage stress in a individually customized way. User feedback is used to improve the system's accuracy, ensuring continuous performance enhancement.

[0506] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0507] Step 1:

[0508] The user accesses the system using a terminal. The terminal collects the user's voice information through the microphone and acquires image information using the camera. This information is necessary to evaluate the user's emotional state. Voice and image information are collected as input data and sent to the next processing step.

[0509] Step 2:

[0510] The device transmits collected audio and image information to the server using a secure communication method. This communication method is encrypted to protect user privacy. The transmitted data becomes input data necessary for analysis on the server.

[0511] Step 3:

[0512] The server uses a generative AI model to identify the user's emotional state based on the received audio and image information. In this process, a speech recognition algorithm analyzes the tone and speed of the voice, and an image processing algorithm analyzes facial expressions. The analysis results output the user's stress level and emotional state, preparing the server to proceed to the next step.

[0513] Step 4:

[0514] The server generates an individualized relaxation plan based on the identified emotional state. This plan generation also takes into account past user data and feedback. As a result, the optimal relaxation method for the user is determined. The generating AI model selects an action using the prompt message "Please select the optimal music based on the user's stress level."

[0515] Step 5:

[0516] The server sends the generated relaxation plan to the terminal. This transmission is also encrypted. The plan received by the terminal becomes a dataset ready for execution.

[0517] Step 6:

[0518] The device executes the received relaxation plan. Specifically, it starts playing relaxation music through the speaker and displays a meditation guide video on the screen. The output is provided to the user as music and video.

[0519] Step 7:

[0520] After receiving a relaxation experience through the device, the user enters feedback information. This feedback is used to improve the accuracy of the system. The entered feedback is sent to the server and stored in the database.

[0521] Step 8:

[0522] The server analyzes the collected feedback information to improve relaxation plans. This allows future plans to be more refined to better meet user needs. The analysis results are used as input data to provide personalized services in the future.

[0523] (Application Example 1)

[0524] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0525] In modern society, it is crucial to understand individual users' stress levels and emotional states in real time and provide appropriate relaxation methods accordingly. However, conventional systems have struggled to accurately analyze users' emotional states and provide individually optimized relaxation plans. Furthermore, they lack mechanisms for improving plans based on feedback, making it difficult to maintain user satisfaction in the long term.

[0526] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0527] In this invention, the server includes information processing means for analyzing voice and image data collected from the user to identify the user's emotional state, information processing means for generating a user-specific relaxation plan based on the emotional state, and control means for the robot to perform specific actions based on the analysis results. This enables effective relaxation tailored to the user's emotional state.

[0528] "Voice data" refers to voice information collected from users and is the subject of analysis for identifying their emotional state.

[0529] "Image data" refers to visual information that captures a user's facial expressions and movements, and is the subject of analysis for identifying their emotional state.

[0530] "Information processing means" refers to a device or program that has the function of analyzing audio and image data obtained from a user and performing processing to identify the user's emotional state.

[0531] A "relaxation plan" is a set of behavioral guidelines and schedules designed to provide users with stress reduction methods tailored to their specific emotional state.

[0532] "Control means" refers to a device or program that has the function of operating a device or system that takes specific actions to carry out a relaxation plan.

[0533] "Feedback data" refers to information about users' reactions and evaluations when they actually used a relaxation plan.

[0534] "Analysis means" refers to a system or program that uses collected feedback data to perform analysis aimed at improving future relaxation plans.

[0535] To implement this invention, the user first uses a consumer robot connected to a dedicated terminal. The robot is equipped with a high-performance camera and microphone, and this hardware is used to collect voice and image data from the user in real time.

[0536] The server converts the collected audio data into text data using speech recognition software such as the Google Speech-to-Text API, and analyzes the user's facial expressions from the image data using image analysis software such as OpenCV. The collected data is used to analyze the user's emotional state.

[0537] Once the user's emotional state is identified, the server uses a generation AI model to create a personalized relaxation plan based on that state. This plan may include playing relaxation music, suggesting guided meditation, and simple exercises.

[0538] The generated relaxation plan is executed by the robot's control system. For example, the robot plays relaxation music through a speaker and displays meditation guides on a screen. This allows the user to create a relaxing environment.

[0539] Furthermore, feedback data from users after they have used a relaxation plan is collected and sent to the server. This feedback data is evaluated using analytical tools and used to improve future relaxation plans. This allows for the provision of services that are more tailored to the individual needs of users.

[0540] As a concrete example, imagine a scenario where, while a user is relaxing in their living room, the robot says, "It looks like you need to relax today," and then plays healing music based on an optimized relaxation plan.

[0541] Examples of prompts for a generative AI model are as follows:

[0542] "Analyze the user's emotional state and suggest appropriate relaxation methods if they are experiencing high stress levels. Include specific actions such as listening to music, short guided meditations, or simple stretching exercises."

[0543] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0544] Step 1:

[0545] The user provides audio and image data to the device. The device uses its camera and microphone to collect the user's facial expressions and voice in real time. During this process, the device prepares to send the data to the server while anonymizing it.

[0546] Step 2:

[0547] The server receives the audio data sent from the terminal and uses speech recognition software (e.g., Google Speech-to-Text API) to convert the audio into text data. The converted text data becomes the input data for sentiment analysis.

[0548] Step 3:

[0549] The server receives image data and uses image analysis software (e.g., OpenCV) to analyze facial expressions. The analysis identifies the emotional state derived from the facial expression, and the obtained emotional information is used in the next step.

[0550] Step 4:

[0551] The server integrates text data obtained from voice and emotion data from image analysis, and uses a generative AI model to classify the user's emotional state. It then generates a relaxation plan based on the emotional state. The input consists of text and image data, and the output is a personalized relaxation plan.

[0552] Step 5:

[0553] The generated relaxation plan is sent from the server to the robot. The robot uses control mechanisms to initiate specific actions to execute the plan. For example, it might play relaxation music through a speaker and display meditation guidance on a screen.

[0554] Step 6:

[0555] Users experience a relaxation plan and provide feedback via a device or robot. This feedback includes satisfaction levels and areas for improvement.

[0556] Step 7:

[0557] The server receives and analyzes user feedback data. This feedback data is used to improve the next relaxation plan. This allows for the provision of services that are more tailored to the individual needs of the user.

[0558] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0559] This invention provides a personal AI system incorporating an emotion engine that recognizes the user's emotional state in real time. This system collects voice and image data, identifies the user's emotions based on this data, and provides an individualized relaxation plan. Specific embodiments are described below.

[0560] The device uses a microphone and camera to collect voice and image data from the user. The collected data is immediately transmitted to a server. The server uses an emotion engine to analyze voice tone, speaking style, and facial expression changes to recognize the user's emotions with high accuracy.

[0561] The emotion engine uses algorithms to analyze the tone and speed of speech, as well as points of change in facial expression, to understand emotional states such as "tension" or "relaxation." Based on this emotion recognition, the server generates an optimal relaxation plan for the user. This plan includes relaxation music, guided meditation, and exercise suggestions. For example, if the user is identified as "anxious," the system will prioritize suggesting calming music and relaxation guides.

[0562] The generated relaxation plan is immediately sent to the device, which then executes it. The user experiences this plan and can reap the stress-reducing effects. After the experience, the user enters feedback into the device, which is collected by the server. The server analyzes this feedback and optimizes the emotion engine's algorithm to improve the accuracy of future relaxation plans.

[0563] This system allows users to understand their emotional state in real time and take specific measures accordingly, enabling them to effectively manage daily stress. For privacy reasons, audio and image data are anonymized and processed securely.

[0564] The following describes the processing flow.

[0565] Step 1:

[0566] The user activates the system and prepares to collect their data through voice input and the camera. The device records this voice and image data in real time.

[0567] Step 2:

[0568] The device sends the collected audio and image data to the server. Data transmission is performed using a secure protocol.

[0569] Step 3:

[0570] The server uses an emotion engine to analyze the received audio data. It analyzes the tone, speed, and nuances of the speech to identify the emotional state.

[0571] Step 4:

[0572] The server analyzes the image data. It detects the user's facial features and analyzes changes in their expressions to more accurately understand their emotional state.

[0573] Step 5:

[0574] The server integrates the analysis results of audio and image data to recognize the user's overall emotional state.

[0575] Step 6:

[0576] The server generates a relaxation plan optimized for the user based on their perceived emotional state. For example, if the user is judged to be highly stressed, it will suggest a plan that includes calming music and meditation guidance.

[0577] Step 7:

[0578] The server generates a relaxation plan and sends it to the device.

[0579] Step 8:

[0580] The device executes the received plan. It plays relaxation music from the speaker and displays a meditation guide on the screen.

[0581] Step 9:

[0582] Users experience a relaxation plan and provide feedback on its effects via their device.

[0583] Step 10:

[0584] The device collects user feedback and sends it to the server.

[0585] Step 11:

[0586] The server analyzes the feedback it receives and uses it to improve the emotion engine and plan generation algorithm. This improves the accuracy of future suggestions.

[0587] (Example 2)

[0588] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0589] In modern society, many people experience stress and anxiety on a daily basis and are seeking effective ways to cope with them. However, conventional stress management methods have difficulty responding to individual situations and emotions, making it challenging to provide appropriate relief methods for each user. Furthermore, there have been limited systems that analyze users' emotions in real time and provide appropriate countermeasures while respecting privacy. This invention aims to solve these problems and provide a system that enables efficient and individualized stress management.

[0590] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0591] In this invention, the server includes information processing means for analyzing audio and video information to identify an emotional state, information processing means for generating a user-specific relaxation plan, and information processing means having a function to optimize the algorithm using feedback data based on the emotion analysis results. This makes it possible to provide an optimal relaxation plan tailored to the individual emotional state of the user, thereby simultaneously achieving effective stress management and privacy protection.

[0592] "Audio information" refers to data that includes acoustic signals collected from users and is used for sentiment analysis.

[0593] "Visual information" refers to data that represents visual signals, including the user's facial expressions and movements, and is used to identify their emotional state.

[0594] "Emotional state" refers to the emotional state of a user and is a concept that includes psychological and physiological responses such as "tension" and "relaxation."

[0595] "Information processing means" refers to methods and technologies for analyzing audio and video information to identify emotional states.

[0596] A "relaxation plan" refers to a combination of relaxation techniques that are individually tailored to the user's emotional state.

[0597] "Information terminal" refers to a device used to provide and implement mitigation plans to users, and includes smartphones and tablets.

[0598] "Opinion data" refers to feedback collected from users, which is useful information for improving mitigation plans.

[0599] "Privacy protection features" refer to technologies and methods for anonymizing and securely processing users' personal data.

[0600] A "generative artificial intelligence model" refers to a system that uses machine learning techniques to analyze a user's emotional state and generate appropriate responses and plans.

[0601] "Recommendation prompts" refer to guiding instructions or suggestions generated based on the user's emotional state, intended to assist in the implementation of the plan.

[0602] This invention is a personal AI system for identifying a user's emotional state in real time and providing a mitigation plan based on that state. The system has the following configuration:

[0603] Data collection and transmission

[0604] The device uses a microdevice to collect audio information from the user and a camera module to collect video information. This information is collected in the background without the user needing to operate the device and is sent to the server via a secure connection.

[0605] Analysis of emotional states

[0606] The server analyzes the received audio and video information using a generative AI model. It analyzes tone, intonation, and speed from the audio information, and changes in facial expressions from the video information, comprehensively identifying the user's emotional state. This analysis employs advanced machine learning algorithms to classify emotional states into multiple categories such as "tension" and "relaxation."

[0607] Generation and provision of mitigation plans

[0608] The server generates a relaxation plan tailored to the user based on the analysis results. This plan includes relaxing music selections, guided meditation, and simple exercise programs. For example, if the user's emotional state is determined to be "anxious," the server will recommend classical music with relaxing effects and an experiential deep breathing guide.

[0609] Feedback and algorithm optimization

[0610] Users can input feedback on the mitigation plan they have experienced on their device. This feedback information is sent to a server and analyzed to improve the accuracy of future mitigation plans. The server uses these analysis results to continuously optimize the generated AI model and enhance the personalized user experience.

[0611] For example, if a user is experiencing increased stress while working from home, the system will generate a plan to promote relaxation and provide a meditation guide with nature sounds as a background.

[0612] An example of a prompt sentence to input into the generating AI model is, "If a user feels anxious while working from home, what kind of relaxation plan should be suggested?"

[0613] This system allows users to receive specific stress relief methods tailored to their emotional state in real time, enabling effective stress management.

[0614] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0615] Step 1:

[0616] The device uses a microdevice to collect audio information from the user and a camera module to collect video information. This enables real-time data collection. The input is the user's voice and facial expressions, which are output as audio and video data. Specifically, the device runs a background process and continuously collects data with the user's permission.

[0617] Step 2:

[0618] The terminal transfers the collected audio and video data to the server via a secure protocol. This transfer process is carried out after anonymizing the data, thus protecting privacy. The input is the audio and video data obtained in step 1, and the output is the anonymized data transmitted to the server.

[0619] Step 3:

[0620] The server analyzes the received audio information and extracts features such as tone, speed, and intonation. A generative AI model is used to infer emotional states from the audio data. The input is the transmitted audio data, and the output is category information of the emotional state (e.g., "tense," "relaxed," etc.).

[0621] Step 4:

[0622] The server similarly analyzes the video information and detects changes in facial expressions. This provides clues to the user's physical emotions. The generative AI model comprehensively determines the emotional state by combining this with audio information. The input is video data, and the output is the analyzed facial expression changes and the resulting emotional information.

[0623] Step 5:

[0624] The server integrates the results of audio and video analysis to generate an optimal relaxation plan. This plan may include music tailored to the user's state, guided meditation, and exercise programs. The input is integrated emotional information, and the output is the details of the relaxation plan.

[0625] Step 6:

[0626] The generated mitigation plan is sent from the server to the terminal, which then uses this information to make suggestions to the user. The terminal automatically performs actions according to the plan, such as playing music or providing meditation guidance. The input is the information from the mitigation plan, and the output is the actions to be taken by the user.

[0627] Step 7:

[0628] Users input feedback on the effectiveness of the provided mitigation plan into a terminal. This feedback data is sent to a server and used to optimize future algorithms. The input is user feedback, and the output is improvement information recorded in the database.

[0629] Step 8:

[0630] The server analyzes the collected feedback and optimizes the generated AI model to improve the accuracy of the next proposal. The input is the feedback data, and the output is the improved algorithm and the improved accuracy of the next mitigation plan.

[0631] (Application Example 2)

[0632] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0633] In elderly care settings, maintaining the mental health of the elderly requires accurately recognizing each individual's emotional state and providing appropriate relaxation plans based on that understanding. However, current systems struggle with real-time emotional recognition, resulting in insufficient individualized care.

[0634] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0635] In this invention, the server includes information processing means for analyzing acoustic and video data to identify emotional states, information processing means for generating relaxation plans based on emotional states, and information processing means for providing plans aimed at the mental care of elderly people. This makes it possible to provide relaxation plans optimized for each elderly person and support the maintenance of their mental health.

[0636] "Audio data" refers to digital data containing information about sound, and is composed of sound waveforms, frequencies, volume, and other elements.

[0637] "Video data" refers to digitized image information acquired by cameras or other devices, and includes the shape, color, and movement of objects.

[0638] "Information processing means" refers to programs or devices that analyze specific input data and extract or calculate necessary information.

[0639] "Emotional state" refers to an individual's inner mental state, and includes psychological states such as joy and sadness.

[0640] A "relaxation plan" is a set of activities and instructions designed to promote mental and physical relaxation in an individual.

[0641] "Response data" refers to data that digitally records user feedback and behavioral results after using a system.

[0642] "Personalization accuracy" refers to the degree of ability to provide services and content that match the characteristics and circumstances of the user.

[0643] To implement this invention, it is necessary to analyze audio and video data collected from users to identify each user's individual emotional state. The server collects data using the smartphone's microphone and camera and transmits it immediately. Specifically, a machine learning model is implemented using Python and TensorFlow to process the data. This makes it possible to recognize the user's emotional state with high accuracy from changes in voice tone, speaking style, and facial expressions.

[0644] Based on the emotion recognition results, the server generates a relaxation plan. This plan includes relaxation music, meditation guides, and exercise suggestions for seniors. For example, if the user is identified as anxious, it will provide calming classical music and deep breathing guidance.

[0645] The device aims to stabilize the user's mental state by implementing the generated relaxation plan. The response data obtained from the user is analyzed on the server and used as data to improve the personalization accuracy of the relaxation plan.

[0646] As a concrete example, here is an example of a prompt message that performs emotion recognition:

[0647] "The feedback from users seems urgent. Based on the situation, what kind of music would be suitable for relaxation?"

[0648] "The user's facial expression seems cloudy. Please suggest a meditation guide that would be suitable for them."

[0649] This system makes it possible to provide precise and appropriate relaxation plans to maintain the mental health of each elderly person in care settings.

[0650] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0651] Step 1:

[0652] The device acquires the user's audio and video data. Specifically, it records audio using the smartphone's microphone and captures facial expressions with its camera. The input is real-time audio and video, and the output is digital audio and video data. This data is transmitted to the server immediately.

[0653] Step 2:

[0654] The server analyzes the received audio and video data. It uses Python and TensorFlow to run a machine learning model, extracting changes in voice tone and facial expressions from the data. The input consists of audio and video data, and the output is a label indicating an emotional state, such as "tension" or "relaxation."

[0655] Step 3:

[0656] The server generates a relaxation plan based on the emotion recognition results. It queries the generating AI model using prompts to select appropriate relaxation methods. The input is an emotional state label, and the output is a relaxation plan (e.g., quiet classical music or deep breathing guidance).

[0657] Step 4:

[0658] The terminal receives a relaxation plan generated from the server and executes the plan for the user. The user performs activities to relax according to the guide. The input is the relaxation plan, and the output is feedback on the user's actions.

[0659] Step 5:

[0660] Users input feedback into a device after their relaxation experience. Specifically, they input data by evaluating their satisfaction level and the effects they felt. The input is the user's feedback, and the output is feedback data.

[0661] Step 6:

[0662] The server analyzes feedback data collected from users to improve the accuracy of relaxation plans. This data is used as training data for a machine learning model, serving as a reference for future plan generation. The input is feedback data, and the output is an improved relaxation plan proposal.

[0663] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0664] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0665] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0666] [Fourth Embodiment]

[0667] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0668] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0669] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0670] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0671] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0672] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0673] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0674] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0675] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0676] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0677] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0678] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0679] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0680] This invention provides a personal AI system to improve users' stress management. The system collects the user's voice and image data in real time, and a server analyzes this data to identify the user's emotional state. Based on this data, the server automatically generates an individualized relaxation plan. The relaxation plan includes methods best suited to the user's condition, such as playing music, guiding meditation, or suggesting short exercises. For example, a user determined to have a high stress level may be suggested to play a relaxing audio file from their device.

[0681] The device executes a relaxation plan received from the server and directly prompts the user for action. For example, the device can play relaxation music through its speakers and display meditation guides on its screen. This allows users to easily create a relaxing environment and reduce stress in their daily lives.

[0682] Furthermore, this system has a feature that allows it to receive feedback from users and improve relaxation plans based on past usage history. For example, if a particular piece of music or program was effective for a user, the server will prioritize suggesting it again next time.

[0683] In this way, the system can adapt to the individual needs of each user and enable efficient emotional care. To implement this invention, it is recommended to also use appropriate privacy protection measures to securely handle the user's voice and image data.

[0684] The following describes the processing flow.

[0685] Step 1:

[0686] The user initiates voice input and image capture. The device collects the user's voice and facial expression data in real time and sends it to the server.

[0687] Step 2:

[0688] The server analyzes the received audio data, examining the tone and speed of the voice to evaluate the user's emotional state. It also analyzes image data to identify emotions from the user's facial expressions.

[0689] Step 3:

[0690] The server identifies the user's current emotional state based on the analysis results. This allows it to determine whether the user is experiencing stress.

[0691] Step 4:

[0692] The server generates a relaxation plan corresponding to the identified emotional state. The plan includes suggestions such as relaxation music, meditation guidance, and short exercise sessions.

[0693] Step 5:

[0694] The server generates a relaxation plan and sends it to the device.

[0695] Step 6:

[0696] The device executes the received relaxation plan. For example, it might play music through the speaker and display a meditation guide on the screen.

[0697] Step 7:

[0698] Users experience a relaxation plan and input feedback about its effects into their device.

[0699] Step 8:

[0700] The device collects user feedback data and sends it to the server.

[0701] Step 9:

[0702] The server analyzes feedback data and learns how to improve the relaxation plan. This makes future suggestions more effective.

[0703] (Example 1)

[0704] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0705] In modern society, users experience various stresses in their daily lives, and effective stress management is essential. However, providing appropriate relaxation methods tailored to individual needs immediately is difficult. Furthermore, it is necessary to perform highly accurate emotional analysis while protecting the privacy of collected data and automatically generate personalized plans. Developing an efficient and reliable system to address these challenges is crucial.

[0706] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0707] In this invention, the server includes information processing means for analyzing audio and image information collected from the user to identify the user's emotional state, information processing means for generating a personalized relaxation plan for the user, and communication means for securely transmitting the audio and image information. This enables the automatic provision of relaxation methods tailored to the user's individual emotional state, thereby reducing stress in daily life.

[0708] A "user" refers to an individual who uses the system and is a data provider of voice and image information.

[0709] "Voice information" refers to data obtained from a user's speech and voice, and is used to identify their emotional state.

[0710] "Image information" refers to visual data, including the user's facial expressions and actions, and is used for sentiment analysis.

[0711] "Information processing means" refers to functions that analyze received data to generate the user's emotional state and relaxation plan.

[0712] "Emotional state" is an indicator that shows the user's psychological state and stress level, and is identified through the analysis of audio and image information.

[0713] A "relaxation plan" refers to a proposal of personalized stress reduction measures that the system generates in response to one's emotional state.

[0714] "Communication means" refers to technologies or protocols for securely and efficiently transmitting voice and image information to a server.

[0715] "Feedback information" refers to data on user feedback and suggestions after using the system, which is used to improve relaxation plans.

[0716] This invention provides a system for effectively improving user stress management. The system's configuration and operation are described below.

[0717] First, the user accesses the system through a personal device. This device is equipped with a microphone to collect audio information and a camera to acquire image information. The device uses these devices to acquire both audio and image data in real time.

[0718] The acquired data is transmitted to a server using secure communication methods. The server then utilizes a generative AI model to analyze the audio and image information and identify the user's emotional state. This analysis employs speech recognition software and image processing algorithms. Identifying the emotional state allows for the assessment of the user's stress level and psychological condition.

[0719] The server automatically generates an individualized relaxation plan based on the identified emotional state. This plan is optimized based on the user's past history and feedback information. For example, a user identified as having a high stress level will be presented with specific relaxation music or meditation guidance. In this process, the AI model is provided with a prompt message stating, "Please select the most suitable music based on the user's stress level."

[0720] Next, the generated relaxation plan is sent to the device and executed. The device can play relaxation music through its speaker and display meditation guidance videos on its screen. This allows the user to immediately experience appropriate relaxation.

[0721] This system allows users to effectively manage stress in a individually customized way. User feedback is used to improve the system's accuracy, ensuring continuous performance enhancement.

[0722] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0723] Step 1:

[0724] The user accesses the system using a terminal. The terminal collects the user's voice information through the microphone and acquires image information using the camera. This information is necessary to evaluate the user's emotional state. Voice and image information are collected as input data and sent to the next processing step.

[0725] Step 2:

[0726] The device transmits collected audio and image information to the server using a secure communication method. This communication method is encrypted to protect user privacy. The transmitted data becomes input data necessary for analysis on the server.

[0727] Step 3:

[0728] The server uses a generative AI model to identify the user's emotional state based on the received audio and image information. In this process, a speech recognition algorithm analyzes the tone and speed of the voice, and an image processing algorithm analyzes facial expressions. The analysis results output the user's stress level and emotional state, preparing the server to proceed to the next step.

[0729] Step 4:

[0730] The server generates an individualized relaxation plan based on the identified emotional state. This plan generation also takes into account past user data and feedback. As a result, the optimal relaxation method for the user is determined. The generating AI model selects an action using the prompt message "Please select the optimal music based on the user's stress level."

[0731] Step 5:

[0732] The server sends the generated relaxation plan to the terminal. This transmission is also encrypted. The plan received by the terminal becomes a dataset ready for execution.

[0733] Step 6:

[0734] The device executes the received relaxation plan. Specifically, it starts playing relaxation music through the speaker and displays a meditation guide video on the screen. The output is provided to the user as music and video.

[0735] Step 7:

[0736] After receiving a relaxation experience through the device, the user enters feedback information. This feedback is used to improve the accuracy of the system. The entered feedback is sent to the server and stored in the database.

[0737] Step 8:

[0738] The server analyzes the collected feedback information to improve relaxation plans. This allows future plans to be more refined to better meet user needs. The analysis results are used as input data to provide personalized services in the future.

[0739] (Application Example 1)

[0740] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0741] In modern society, it is crucial to understand individual users' stress levels and emotional states in real time and provide appropriate relaxation methods accordingly. However, conventional systems have struggled to accurately analyze users' emotional states and provide individually optimized relaxation plans. Furthermore, they lack mechanisms for improving plans based on feedback, making it difficult to maintain user satisfaction in the long term.

[0742] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0743] In this invention, the server includes information processing means for analyzing voice and image data collected from the user to identify the user's emotional state, information processing means for generating a user-specific relaxation plan based on the emotional state, and control means for the robot to perform specific actions based on the analysis results. This enables effective relaxation tailored to the user's emotional state.

[0744] "Voice data" refers to voice information collected from users and is the subject of analysis for identifying their emotional state.

[0745] "Image data" refers to visual information that captures a user's facial expressions and movements, and is the subject of analysis for identifying their emotional state.

[0746] "Information processing means" refers to a device or program that has the function of analyzing audio and image data obtained from a user and performing processing to identify the user's emotional state.

[0747] A "relaxation plan" is a set of behavioral guidelines and schedules designed to provide users with stress reduction methods tailored to their specific emotional state.

[0748] "Control means" refers to a device or program that has the function of operating a device or system that takes specific actions to carry out a relaxation plan.

[0749] "Feedback data" refers to information about users' reactions and evaluations when they actually used a relaxation plan.

[0750] "Analysis means" refers to a system or program that uses collected feedback data to perform analysis aimed at improving future relaxation plans.

[0751] To implement this invention, the user first uses a consumer robot connected to a dedicated terminal. The robot is equipped with a high-performance camera and microphone, and this hardware is used to collect voice and image data from the user in real time.

[0752] The server converts the collected audio data into text data using speech recognition software such as the Google Speech-to-Text API, and analyzes the user's facial expressions from the image data using image analysis software such as OpenCV. The collected data is used to analyze the user's emotional state.

[0753] Once the user's emotional state is identified, the server uses a generation AI model to create a personalized relaxation plan based on that state. This plan may include playing relaxation music, suggesting guided meditation, and simple exercises.

[0754] The generated relaxation plan is executed by the robot's control system. For example, the robot plays relaxation music through a speaker and displays meditation guides on a screen. This allows the user to create a relaxing environment.

[0755] Furthermore, feedback data from users after they have used a relaxation plan is collected and sent to the server. This feedback data is evaluated using analytical tools and used to improve future relaxation plans. This allows for the provision of services that are more tailored to the individual needs of users.

[0756] As a concrete example, imagine a scenario where, while a user is relaxing in their living room, the robot says, "It looks like you need to relax today," and then plays healing music based on an optimized relaxation plan.

[0757] Examples of prompts for a generative AI model are as follows:

[0758] "Analyze the user's emotional state and suggest appropriate relaxation methods if they are experiencing high stress levels. Include specific actions such as listening to music, short guided meditations, or simple stretching exercises."

[0759] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0760] Step 1:

[0761] The user provides audio and image data to the device. The device uses its camera and microphone to collect the user's facial expressions and voice in real time. During this process, the device prepares to send the data to the server while anonymizing it.

[0762] Step 2:

[0763] The server receives the audio data sent from the terminal and uses speech recognition software (e.g., Google Speech-to-Text API) to convert the audio into text data. The converted text data becomes the input data for sentiment analysis.

[0764] Step 3:

[0765] The server receives image data and uses image analysis software (e.g., OpenCV) to analyze facial expressions. The analysis identifies the emotional state derived from the facial expression, and the obtained emotional information is used in the next step.

[0766] Step 4:

[0767] The server integrates text data obtained from voice and emotion data from image analysis, and uses a generative AI model to classify the user's emotional state. It then generates a relaxation plan based on the emotional state. The input consists of text and image data, and the output is a personalized relaxation plan.

[0768] Step 5:

[0769] The generated relaxation plan is sent from the server to the robot. The robot uses control mechanisms to initiate specific actions to execute the plan. For example, it might play relaxation music through a speaker and display meditation guidance on a screen.

[0770] Step 6:

[0771] Users experience a relaxation plan and provide feedback via a device or robot. This feedback includes satisfaction levels and areas for improvement.

[0772] Step 7:

[0773] The server receives and analyzes user feedback data. This feedback data is used to improve the next relaxation plan. This allows for the provision of services that are more tailored to the individual needs of the user.

[0774] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0775] This invention provides a personal AI system incorporating an emotion engine that recognizes the user's emotional state in real time. This system collects voice and image data, identifies the user's emotions based on this data, and provides an individualized relaxation plan. Specific embodiments are described below.

[0776] The device uses a microphone and camera to collect voice and image data from the user. The collected data is immediately transmitted to a server. The server uses an emotion engine to analyze voice tone, speaking style, and facial expression changes to recognize the user's emotions with high accuracy.

[0777] The emotion engine uses algorithms to analyze the tone and speed of speech, as well as points of change in facial expression, to understand emotional states such as "tension" or "relaxation." Based on this emotion recognition, the server generates an optimal relaxation plan for the user. This plan includes relaxation music, guided meditation, and exercise suggestions. For example, if the user is identified as "anxious," the system will prioritize suggesting calming music and relaxation guides.

[0778] The generated relaxation plan is immediately sent to the device, which then executes it. The user experiences this plan and can reap the stress-reducing effects. After the experience, the user enters feedback into the device, which is collected by the server. The server analyzes this feedback and optimizes the emotion engine's algorithm to improve the accuracy of future relaxation plans.

[0779] This system allows users to understand their emotional state in real time and take specific measures accordingly, enabling them to effectively manage daily stress. For privacy reasons, audio and image data are anonymized and processed securely.

[0780] The following describes the processing flow.

[0781] Step 1:

[0782] The user activates the system and prepares to collect their data through voice input and the camera. The device records this voice and image data in real time.

[0783] Step 2:

[0784] The device sends the collected audio and image data to the server. Data transmission is performed using a secure protocol.

[0785] Step 3:

[0786] The server uses an emotion engine to analyze the received audio data. It analyzes the tone, speed, and nuances of the speech to identify the emotional state.

[0787] Step 4:

[0788] The server analyzes the image data. It detects the user's facial features and analyzes changes in their expressions to more accurately understand their emotional state.

[0789] Step 5:

[0790] The server integrates the analysis results of audio and image data to recognize the user's overall emotional state.

[0791] Step 6:

[0792] The server generates a relaxation plan optimized for the user based on their perceived emotional state. For example, if the user is judged to be highly stressed, it will suggest a plan that includes calming music and meditation guidance.

[0793] Step 7:

[0794] The server generates a relaxation plan and sends it to the device.

[0795] Step 8:

[0796] The device executes the received plan. It plays relaxation music from the speaker and displays a meditation guide on the screen.

[0797] Step 9:

[0798] Users experience a relaxation plan and provide feedback on its effects via their device.

[0799] Step 10:

[0800] The device collects user feedback and sends it to the server.

[0801] Step 11:

[0802] The server analyzes the feedback it receives and uses it to improve the emotion engine and plan generation algorithm. This improves the accuracy of future suggestions.

[0803] (Example 2)

[0804] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0805] In modern society, many people experience stress and anxiety on a daily basis and are seeking effective ways to cope with them. However, conventional stress management methods have difficulty responding to individual situations and emotions, making it challenging to provide appropriate relief methods for each user. Furthermore, there have been limited systems that analyze users' emotions in real time and provide appropriate countermeasures while respecting privacy. This invention aims to solve these problems and provide a system that enables efficient and individualized stress management.

[0806] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0807] In this invention, the server includes information processing means for analyzing audio and video information to identify an emotional state, information processing means for generating a user-specific relaxation plan, and information processing means having a function to optimize the algorithm using feedback data based on the emotion analysis results. This makes it possible to provide an optimal relaxation plan tailored to the individual emotional state of the user, thereby simultaneously achieving effective stress management and privacy protection.

[0808] "Audio information" refers to data that includes acoustic signals collected from users and is used for sentiment analysis.

[0809] "Visual information" refers to data that represents visual signals, including the user's facial expressions and movements, and is used to identify their emotional state.

[0810] "Emotional state" refers to the emotional state of a user and is a concept that includes psychological and physiological responses such as "tension" and "relaxation."

[0811] "Information processing means" refers to methods and technologies for analyzing audio and video information to identify emotional states.

[0812] A "relaxation plan" refers to a combination of relaxation techniques that are individually tailored to the user's emotional state.

[0813] "Information terminal" refers to a device used to provide and implement mitigation plans to users, and includes smartphones and tablets.

[0814] "Opinion data" refers to feedback collected from users, which is useful information for improving mitigation plans.

[0815] "Privacy protection features" refer to technologies and methods for anonymizing and securely processing users' personal data.

[0816] A "generative artificial intelligence model" refers to a system that uses machine learning techniques to analyze a user's emotional state and generate appropriate responses and plans.

[0817] "Recommendation prompts" refer to guiding instructions or suggestions generated based on the user's emotional state, intended to assist in the implementation of the plan.

[0818] This invention is a personal AI system for identifying a user's emotional state in real time and providing a mitigation plan based on that state. The system has the following configuration:

[0819] Data collection and transmission

[0820] The device uses a microdevice to collect audio information from the user and a camera module to collect video information. This information is collected in the background without the user needing to operate the device and is sent to the server via a secure connection.

[0821] Analysis of emotional states

[0822] The server analyzes the received audio and video information using a generative AI model. It analyzes tone, intonation, and speed from the audio information, and changes in facial expressions from the video information, comprehensively identifying the user's emotional state. This analysis employs advanced machine learning algorithms to classify emotional states into multiple categories such as "tension" and "relaxation."

[0823] Generation and provision of mitigation plans

[0824] The server generates a relaxation plan tailored to the user based on the analysis results. This plan includes relaxing music selections, guided meditation, and simple exercise programs. For example, if the user's emotional state is determined to be "anxious," the server will recommend classical music with relaxing effects and an experiential deep breathing guide.

[0825] Feedback and algorithm optimization

[0826] Users can input feedback on the mitigation plan they have experienced on their device. This feedback information is sent to a server and analyzed to improve the accuracy of future mitigation plans. The server uses these analysis results to continuously optimize the generated AI model and enhance the personalized user experience.

[0827] For example, if a user is experiencing increased stress while working from home, the system will generate a plan to promote relaxation and provide a meditation guide with nature sounds as a background.

[0828] An example of a prompt sentence to input into the generating AI model is, "If a user feels anxious while working from home, what kind of relaxation plan should be suggested?"

[0829] This system allows users to receive specific stress relief methods tailored to their emotional state in real time, enabling effective stress management.

[0830] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0831] Step 1:

[0832] The device uses a microdevice to collect audio information from the user and a camera module to collect video information. This enables real-time data collection. The input is the user's voice and facial expressions, which are output as audio and video data. Specifically, the device runs a background process and continuously collects data with the user's permission.

[0833] Step 2:

[0834] The terminal transfers the collected audio and video data to the server via a secure protocol. This transfer process is carried out after anonymizing the data, thus protecting privacy. The input is the audio and video data obtained in step 1, and the output is the anonymized data transmitted to the server.

[0835] Step 3:

[0836] The server analyzes the received audio information and extracts features such as tone, speed, and intonation. A generative AI model is used to infer emotional states from the audio data. The input is the transmitted audio data, and the output is category information of the emotional state (e.g., "tense," "relaxed," etc.).

[0837] Step 4:

[0838] The server similarly analyzes the video information and detects changes in facial expressions. This provides clues to the user's physical emotions. The generative AI model comprehensively determines the emotional state by combining this with audio information. The input is video data, and the output is the analyzed facial expression changes and the resulting emotional information.

[0839] Step 5:

[0840] The server integrates the results of audio and video analysis to generate an optimal relaxation plan. This plan may include music tailored to the user's state, guided meditation, and exercise programs. The input is integrated emotional information, and the output is the details of the relaxation plan.

[0841] Step 6:

[0842] The generated mitigation plan is sent from the server to the terminal, which then uses this information to make suggestions to the user. The terminal automatically performs actions according to the plan, such as playing music or providing meditation guidance. The input is the information from the mitigation plan, and the output is the actions to be taken by the user.

[0843] Step 7:

[0844] Users input feedback on the effectiveness of the provided mitigation plan into a terminal. This feedback data is sent to a server and used to optimize future algorithms. The input is user feedback, and the output is improvement information recorded in the database.

[0845] Step 8:

[0846] The server analyzes the collected feedback and optimizes the generated AI model to improve the accuracy of the next proposal. The input is the feedback data, and the output is the improved algorithm and the improved accuracy of the next mitigation plan.

[0847] (Application Example 2)

[0848] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0849] In elderly care settings, maintaining the mental health of the elderly requires accurately recognizing each individual's emotional state and providing appropriate relaxation plans based on that understanding. However, current systems struggle with real-time emotional recognition, resulting in insufficient individualized care.

[0850] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0851] In this invention, the server includes information processing means for analyzing acoustic and video data to identify emotional states, information processing means for generating relaxation plans based on emotional states, and information processing means for providing plans aimed at the mental care of elderly people. This makes it possible to provide relaxation plans optimized for each elderly person and support the maintenance of their mental health.

[0852] "Audio data" refers to digital data containing information about sound, and is composed of sound waveforms, frequencies, volume, and other elements.

[0853] "Video data" refers to digitized image information acquired by cameras or other devices, and includes the shape, color, and movement of objects.

[0854] "Information processing means" refers to programs or devices that analyze specific input data and extract or calculate necessary information.

[0855] "Emotional state" refers to an individual's inner mental state, and includes psychological states such as joy and sadness.

[0856] A "relaxation plan" is a set of activities and instructions designed to promote mental and physical relaxation in an individual.

[0857] "Response data" refers to data that digitally records user feedback and behavioral results after using a system.

[0858] "Personalization accuracy" refers to the degree of ability to provide services and content that match the characteristics and circumstances of the user.

[0859] To implement this invention, it is necessary to analyze audio and video data collected from users to identify each user's individual emotional state. The server collects data using the smartphone's microphone and camera and transmits it immediately. Specifically, a machine learning model is implemented using Python and TensorFlow to process the data. This makes it possible to recognize the user's emotional state with high accuracy from changes in voice tone, speaking style, and facial expressions.

[0860] Based on the emotion recognition results, the server generates a relaxation plan. This plan includes relaxation music, meditation guides, and exercise suggestions for seniors. For example, if the user is identified as anxious, it will provide calming classical music and deep breathing guidance.

[0861] The device aims to stabilize the user's mental state by implementing the generated relaxation plan. The response data obtained from the user is analyzed on the server and used as data to improve the personalization accuracy of the relaxation plan.

[0862] As a concrete example, here is an example of a prompt message that performs emotion recognition:

[0863] "The feedback from users seems urgent. Based on the situation, what kind of music would be suitable for relaxation?"

[0864] "The user's facial expression seems cloudy. Please suggest a meditation guide that would be suitable for them."

[0865] This system makes it possible to provide precise and appropriate relaxation plans to maintain the mental health of each elderly person in care settings.

[0866] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0867] Step 1:

[0868] The device acquires the user's audio and video data. Specifically, it records audio using the smartphone's microphone and captures facial expressions with its camera. The input is real-time audio and video, and the output is digital audio and video data. This data is transmitted to the server immediately.

[0869] Step 2:

[0870] The server analyzes the received audio and video data. It uses Python and TensorFlow to run a machine learning model, extracting changes in voice tone and facial expressions from the data. The input consists of audio and video data, and the output is a label indicating an emotional state, such as "tension" or "relaxation."

[0871] Step 3:

[0872] The server generates a relaxation plan based on the emotion recognition results. It queries the generating AI model using prompts to select appropriate relaxation methods. The input is an emotional state label, and the output is a relaxation plan (e.g., quiet classical music or deep breathing guidance).

[0873] Step 4:

[0874] The terminal receives a relaxation plan generated from the server and executes the plan for the user. The user performs activities to relax according to the guide. The input is the relaxation plan, and the output is feedback on the user's actions.

[0875] Step 5:

[0876] Users input feedback into a device after their relaxation experience. Specifically, they input data by evaluating their satisfaction level and the effects they felt. The input is the user's feedback, and the output is feedback data.

[0877] Step 6:

[0878] The server analyzes feedback data collected from users to improve the accuracy of relaxation plans. This data is used as training data for a machine learning model, serving as a reference for future plan generation. The input is feedback data, and the output is an improved relaxation plan proposal.

[0879] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0880] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0881] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0882] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0883] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0884] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0885] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0886] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0887] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0888] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0889] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0890] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0891] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0892] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0893] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0894] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using this memory.

[0895] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0896] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0897] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0898] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0899] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0900] The following is further disclosed regarding the embodiments described above.

[0901] (Claim 1)

[0902] A program control means that analyzes audio and image data collected from the user to identify the emotional state,

[0903] A program control means that generates a user-specific relaxation plan based on the aforementioned emotional state,

[0904] A program control means that issues instructions to the user terminal in order to execute the relaxation plan,

[0905] A means for collecting user feedback data and analyzing it to improve the relaxation plan,

[0906] A system that includes this.

[0907] (Claim 2)

[0908] The system according to claim 1, further comprising means for collecting the aforementioned audio data and image data anonymized while protecting the user's privacy.

[0909] (Claim 3)

[0910] The system according to claim 1, having a learning function that improves the accuracy of personalizing relaxation plans based on the user's past history data.

[0911] "Example 1"

[0912] (Claim 1)

[0913] Information processing means for identifying emotional states by analyzing audio and image information collected from users,

[0914] Information processing means for generating a user-specific relaxation plan based on the aforementioned emotional state,

[0915] Information control means that issues instructions to user equipment in order to carry out the relaxation plan,

[0916] An analytical means for collecting user feedback information and improving the relaxation plan,

[0917] A communication means for securely transmitting the aforementioned audio and image information,

[0918] A system that includes this.

[0919] (Claim 2)

[0920] The system according to claim 1, further comprising a function for collecting the aforementioned audio information and image information anonymized while protecting the user's privacy.

[0921] (Claim 3)

[0922] The system according to claim 1, having the ability to learn to improve the personalization accuracy of relaxation plans based on the user's past history information.

[0923] "Application Example 1"

[0924] (Claim 1)

[0925] An information processing means that analyzes audio and image data collected from users to identify their emotional state,

[0926] Information processing means for generating a user-specific relaxation plan based on the aforementioned emotional state,

[0927] Information processing means that issues instructions to the terminal in order to execute the relaxation plan,

[0928] An analytical means for collecting user feedback data and improving the relaxation plan,

[0929] A control means that allows the robot to perform specific actions based on the analysis results,

[0930] A system that includes this.

[0931] (Claim 2)

[0932] The system according to claim 1, further comprising means for collecting the aforementioned audio data and image data anonymized while protecting the user's privacy.

[0933] (Claim 3)

[0934] The system according to claim 1, having a learning function that improves the accuracy of personalizing relaxation plans based on the user's past history data.

[0935] "Example 2 of combining an emotion engine"

[0936] (Claim 1)

[0937] Information processing means for analyzing audio and video information to identify emotional states,

[0938] Information processing means for generating a user-specific mitigation plan based on the aforementioned emotional state,

[0939] Information processing means that issues instructions to the user's information terminal in order to implement the aforementioned mitigation plan,

[0940] A means for collecting user feedback data and analyzing it to improve the mitigation plan,

[0941] An information processing means having a function to optimize the algorithm using feedback data based on emotion analysis results,

[0942] A means having a privacy protection function for collecting audio and video information in an anonymized form,

[0943] A system that includes this.

[0944] (Claim 2)

[0945] The system according to claim 1, having a learning function that improves the individualization accuracy of the mitigation plan based on the user's past history information.

[0946] (Claim 3)

[0947] The system according to claim 1, having a function to create recommendation prompts according to emotional states using a generative artificial intelligence model.

[0948] "Application example 2 when combining with an emotional engine"

[0949] (Claim 1)

[0950] An information processing means that analyzes audio and video data collected from users to identify their emotional state,

[0951] Information processing means for generating a user-specific relaxation plan based on the aforementioned emotional state,

[0952] Information processing means that issues instructions to the user device in order to carry out the relaxation plan,

[0953] Information analysis means for collecting user response data and improving the relaxation plan,

[0954] Information processing means including a function to provide plans aimed at the mental care of the elderly,

[0955] A system that includes this.

[0956] (Claim 2)

[0957] The system according to claim 1, further comprising means for collecting the aforementioned audio data and video data anonymized while protecting the user's privacy.

[0958] (Claim 3)

[0959] The system according to claim 1, having a learning function that improves the personalization accuracy of relaxation plans based on the user's past history data. [Explanation of symbols]

[0960] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A program control means that analyzes audio and image data collected from the user to identify the emotional state, A program control means that generates a user-specific relaxation plan based on the aforementioned emotional state, A program control means that issues instructions to the user terminal in order to execute the relaxation plan, A means for collecting user feedback data and analyzing it to improve the relaxation plan, A system that includes this.

2. The system according to claim 1, further comprising means for collecting the aforementioned audio data and image data anonymized while protecting the user's privacy.

3. The system according to claim 1, having a learning function that improves the accuracy of personalizing relaxation plans based on the user's past history data.