system
An AI-powered system in mobile devices assesses children's stress through conversation analysis, notifying parents and offering advice to manage stress, addressing the challenge of unrecognized stress in children and promoting healthy development.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-12
- Publication Date
- 2026-06-24
AI Technical Summary
Modern children face stress due to various factors, and it is difficult for them to recognize and manage their stress levels, which can impact their development and health, while caregivers often miss the signs.
A system using an artificial intelligence character in a mobile device analyzes everyday conversations to assess stress levels, generates notifications for parents, and provides advice to children to support their psychological health.
The system effectively monitors and supports children's mental health by providing timely notifications and advice to parents, promoting healthy development and emotional well-being.
Smart Images

Figure 2026103452000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] Modern children are daily burdened with stress due to various factors, and if this state is not properly managed, it may have an adverse impact on their development and health. However, it is difficult for children themselves to recognize and appropriately deal with their own stress, and caregivers may also miss the signs. In such a situation, there is a need for a mechanism to continuously monitor the mental health of children and notify caregivers at an appropriate time.
Means for Solving the Problems
[0005] This invention provides a means of analyzing a user's stress level through everyday conversation and evaluating their stress level, using an artificial intelligence character installed in a mobile phone or smartphone for children. Furthermore, by generating appropriate notifications for parents using these evaluation results, it makes it easier for parents to understand their child's mental health. In addition, it provides direct advice to the child based on the notification to reduce stress. The aim of this series of means is to comprehensively support the psychological health of children and promote their healthy development.
[0006] "User authentication information" refers to information used to identify a user and grant them access to a device or service.
[0007] An "artificial intelligence character" is a virtual entity composed of a program that enables interaction with the user, and it engages in conversation using natural language processing.
[0008] "Data obtained from conversations" refers to information collected through interactions with users, including linguistic content and tone of voice.
[0009] "Assessing stress levels" refers to analyzing the user's psychological state and determining the degree of mental burden or pressure they are experiencing.
[0010] "Generating notifications" refers to creating and sending messages to recipients based on evaluation results and important information.
[0011] "Providing advice" means giving users specific instructions or suggestions for improving the situation or solving problems.
[0012] "Long-term monitoring of data" means continuously observing data collected over a certain period and analyzing trends.
[0013] "Generating additional questions" means creating new questions and presenting them to the user in order to obtain more detailed information.
[0014] "Specific action advice for parents" refers to specific instructions that suggest parents can take appropriate actions in accordance with their child's psychological state. [Brief explanation of the drawing]
[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of the data processing device and smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, when an emotion engine is combined. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.
Embodiments for Carrying Out the Invention
[0016] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0017] First, the terms used in the following description will be explained.
[0018] In the following embodiments, a labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0019] In the following embodiments, a labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0020] In the following embodiments, a labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.
[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0023] [First Embodiment]
[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0036] This invention provides a system for monitoring a child's mental health through interaction with an artificial intelligence character embedded in a communication device for children. This system performs multiple processes, including user authentication, natural language processing, stress assessment, and providing appropriate notifications and advice.
[0037] User Authentication
[0038] The device first requests user authentication information and grants access only if the user is correctly identified. This ensures data security and privacy. For example, biometric authentication can be used to easily verify that the device belongs to the child.
[0039] Natural language conversation
[0040] The device engages in everyday conversations with the user through an artificial intelligence character. These conversations utilize a standard voice interaction interface, making them user-friendly for children. For example, the device might ask the user, "What happened at school today?" and the user might reply, "I had fun today." This response is then used for data analysis in the next step.
[0041] Stress assessment
[0042] The server takes in conversational data sent from the terminal and uses natural language processing techniques to analyze signs of stress. For example, a prolonged use of negative words or tone may indicate a high stress level. Based on this assessment, if a specific trigger is identified, additional questions can be generated and asked to the user.
[0043] Providing notifications and advice
[0044] Based on the stress assessment results, the server generates notifications for parents, providing important information and advice. These notifications are privacy-conscious and may be sent in a format such as, "Your child may have been experiencing stress recently." Furthermore, the device provides users with specific actionable advice via voice or text. For example, it might suggest, "Why not take a short break and play some games today?"
[0045] Data monitoring
[0046] The server records and analyzes user conversation data and stress assessment results over a long period to understand changes in the user's stress levels. Through long-term monitoring, the server provides regular feedback to parents and helps them keep track of their child's growth and development.
[0047] This system aims to comprehensively understand a child's mental health, including their psychological state, and to enable integrated care in collaboration with parents.
[0048] The following describes the processing flow.
[0049] Step 1:
[0050] When the device is powered on, it displays a login screen and prompts the user to enter authentication information. This includes a password or biometric authentication (fingerprint or facial recognition). The user provides the login information, and the device verifies the authentication.
[0051] Step 2:
[0052] After authentication, the device activates an artificial intelligence character. The character greets the child and begins a casual conversation, asking natural questions such as, "How was school today?"
[0053] Step 3:
[0054] The user responds to questions from the device using voice. The user's voice is converted to text by the device, and this data is saved as the content of the conversation.
[0055] Step 4:
[0056] The terminal sends the conversation content to the server, which analyzes the text data using a natural language processing engine. It then infers the stress level from the tone and keywords.
[0057] Step 5:
[0058] Based on the stress level estimation, the server generates additional questions as needed. These questions are sent to the terminal, which then asks the user further questions.
[0059] Step 6:
[0060] The device sends additional responses from the user back to the server, and the server completes the stress analysis. The evaluation results are notified to the parent in a privacy-protected format.
[0061] Step 7:
[0062] The server generates advice based on the child's stressors and sends it to the device. The device then presents the advice to the user, such as "Try taking a short break."
[0063] Step 8:
[0064] The server records conversation data and stress assessment results over the long term and provides regular feedback to parents based on the monitoring results. This allows for responses tailored to the user's growth.
[0065] (Example 1)
[0066] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0067] In today's information society, the importance of continuously monitoring children's mental health is increasing. However, conventional technologies struggle to adequately assess children's daily emotions and stress levels, and to provide effective advice and notifications based on that assessment. Therefore, an integrated system is needed to promote mental health and monitor children's development.
[0068] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0069] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an information processing device that performs natural language conversation with the user, means for analyzing the record obtained from the conversation and evaluating the mental state, means for generating a notification to the supervisor based on the evaluation result, means for providing the user with behavioral guidelines based on the notification, and means for monitoring the record over the long term. This makes it possible to comprehensively evaluate the mental health state of a child and take appropriate action in cooperation with the supervisor.
[0070] "User authentication information" refers to unique identification information entered to identify a user, and is used to verify their permission to access the system.
[0071] An "information processing device" is a device or program that engages in conversation with a user and analyzes the content of that conversation, and is capable of natural language processing using artificial intelligence.
[0072] "Records" refer to data including the content of conversations with users and the results of their analysis, and are used to evaluate their mental state.
[0073] "Mental state" refers to the user's psychological health and emotional changes, including stress levels and emotional tendencies.
[0074] A "supervisor" is an individual or organization that has the authority to monitor the user's psychological health and intervene or provide support as needed.
[0075] "Action guidelines" are specific actions and advice that users should take to improve or maintain their mental state.
[0076] "Long-term monitoring" refers to the act of collecting and analyzing data over time to understand changes and trends in user behavior.
[0077] This invention is a system for monitoring a user's mental health and taking appropriate action. The system mainly consists of a server and terminals.
[0078] The device is a personal information terminal (PDI) used by users on a daily basis and is equipped with natural language processing technology utilizing artificial intelligence. Users first enter authentication information on the device screen, which includes biometric authentication such as fingerprint and facial recognition. The device also engages in natural language conversations with the user through an AI-powered voice dialogue interface and records the conversation. Specific software applications include speech recognition engines and natural language processing libraries.
[0079] The server receives conversation data sent from the terminal and analyzes its content. This analysis utilizes a generative AI model to assess the user's stress level based on their statements. For example, if the conversation contains many negative expressions such as "tired" or "sad," the server will determine that the user is experiencing high stress. Based on the assessment, the server generates a privacy-conscious notification for the supervisor, stating something like, "Your child may be experiencing stress." It also provides the user with specific actionable advice, such as, "Why not take some time to relax today?" This process allows for the early detection and resolution of problems.
[0080] As a concrete example, by inputting a prompt such as "Please provide examples of positive responses to questions asked by children" into a generative AI model, natural conversations can be generated. In this way, a system that comprehensively manages the user's mental health can be realized.
[0081] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0082] Step 1:
[0083] The terminal receives user authentication information from the user as input. Specifically, fingerprint authentication or facial recognition is performed to identify the user's identity. This authentication information is processed by the terminal, and if authentication is successful, the terminal outputs permission to the server.
[0084] Step 2:
[0085] The user initiates a conversation using the device's voice input interface. The device receives this voice as input data and converts it into text data via a speech recognition engine. This text data is sent from the device to the server, forming the basis for natural language processing.
[0086] Step 3:
[0087] The server receives text data transferred from the terminal as input and processes it using a generative AI model. Specifically, it uses natural language processing techniques to analyze the user's emotions and stress levels. The analysis results are output as numerical data indicating the presence and degree of stress. This output is used to generate subsequent notifications.
[0088] Step 4:
[0089] The server generates a notification for the supervisor based on the stress analysis results. This process generates a notification message, such as "Your child may be experiencing stress," in a privacy-conscious manner. The generated notification message is sent from the server to the supervisor's device.
[0090] Step 5:
[0091] The server creates actionable guidelines for the user and sends them to the device. This process provides specific advice, such as "Why not take some time to relax today?", based on stress assessments obtained from a generated AI model. This information is output to the device and presented to the user visually or audibly.
[0092] Step 6:
[0093] The server stores all conversation data and stress analysis results long-term, preparing them for time-series analysis. This step involves accumulating data in a database and performing trend analysis to understand future changes. This creates a data foundation for understanding the evolution of users' mental health.
[0094] (Application Example 1)
[0095] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0096] In recent years, there has been a growing need to continuously monitor children's mental health and provide early detection and appropriate advice. However, traditional methods have presented challenges in real-time assessment of psychological state and providing prompt feedback to parents. Furthermore, the lack of systems that integrate children's conditions into their daily lives has made natural communication with children and stress assessment difficult.
[0097] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0098] In this invention, the server includes means for receiving authentication information and identifying the user; means for configuring an artificial intelligence agent that conducts natural language dialogue with the user; means for analyzing information collected from the dialogue and evaluating the psychological health status; means for creating a notification for the user or caregiver based on the evaluation results; means for providing advice to the user based on the notification; means for continuously monitoring the information; means for analyzing the tone during the dialogue using speech recognition and emotion analysis technology and generating a response; and means for providing feedback to the caregiver via a terminal. This enables real-time evaluation of the child's psychological health status in daily life and the provision of quick and appropriate information to parents.
[0099] "Authentication information" refers to information used by a device to identify a specific user, and includes biometric information and passwords.
[0100] "User" refers to an individual who operates and interacts with the system, and in this invention, it primarily refers to a child.
[0101] "Natural language dialogue" refers to a form of communication that takes place between a user and an artificial intelligence agent through voice or text.
[0102] An "artificial intelligence agent" is a program that performs natural language processing and collects and analyzes information through dialogue with the user.
[0103] "Psychological health" refers to the state of the user's emotions, stress levels, and mental stability.
[0104] A "notification" is a message sent to the user or caregiver based on the system's evaluation.
[0105] "Advice" refers to information that includes specific actions or points to consider, suggested to the user based on the notification.
[0106] "Speech recognition technology" is a technology that converts a user's voice into text data.
[0107] "Emotional analysis technology" is an analytical technique used to identify a user's emotions and tone from collected dialogue data.
[0108] "Feedback" refers to information provided to caregivers through a device, including evaluation results and advice.
[0109] The system in this invention is primarily realized through the interaction of a server, a terminal, and a user. The server receives the user's authentication information and ensures that the user is correctly identified. It is desirable to use biometric authentication technology for this purpose. The terminal engages in natural language dialogue with the user, specifically a child, and an artificial intelligence agent. This dialogue is converted into text information using speech recognition technology and sent to the server.
[0110] The server processes the received conversation data using sentiment analysis technology to assess the user's psychological well-being. Based on this assessment, the server generates notifications and sends them to parents as needed. These notifications may include an assessment of the current situation and specific behavioral advice. The device also utilizes sentiment analysis technology to provide necessary support to the user via voice or text.
[0111] The software used includes Google® Speech-to-Text API and IBM Watson® Tone Analyzer for speech recognition and sentiment analysis. Firebase is used for the database to store user dialogue data and evaluation results long-term, enabling continuous monitoring.
[0112] For example, if a child is asked "What happened at school today?" while looking at their device and replies "It was fun," that data is sent to the server in real time. If the sentiment analysis evaluates it as "positive," no special notification is generated; only normal feedback is provided.
[0113] An example of a prompt for a generative AI model would be: "Listen to the user about how they're feeling today and what happened at school. If a negative tone is detected, continue asking questions to determine what to inform the parents about."
[0114] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0115] Step 1:
[0116] The user initiates access to the device. The device obtains the user's authentication information using methods such as facial recognition or fingerprint authentication. The input is the user's biometric information, and the output is whether authentication was successful or not. The device uses authentication software and proceeds to the next step if successful.
[0117] Step 2:
[0118] The device uses an artificial intelligence agent to initiate a natural language dialogue with the user. The input is the user's voice, and the output is text data. Speech recognition technology (such as Google Speech-to-Text API) is used to convert the voice data into text.
[0119] Step 3:
[0120] The server receives text data sent from the terminal and evaluates the user's psychological health using sentiment analysis technology. The input is the user's text data, and the output is the sentiment evaluation result. Data analysis is performed using tools such as IBM Watson Tone Analyzer.
[0121] Step 4:
[0122] The server generates notifications based on the results of sentiment analysis. The input is the sentiment evaluation result, and the output is the notification message. If the evaluation reaches a specific trigger, a notification is created for the caregiver.
[0123] Step 5:
[0124] The device provides advice to the user based on the generated notifications. The input is the notification message, and the output is feedback to the user in the form of voice or text. The device uses an AI assistant to read out specific advice.
[0125] Step 6:
[0126] The server stores user text data and sentiment rating results for long-term monitoring. Input is user text data and rating results, and output is storage in a database. A database service such as Firebase is used to record the data.
[0127] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0128] This invention provides a system for monitoring a child's mental health through interaction with the user, by equipping a communication terminal with an artificial intelligence character and an emotion engine. This system performs multiple processes, including user authentication, natural language processing, stress assessment, emotion recognition, and providing appropriate notifications and advice.
[0129] User Authentication
[0130] The device receives authentication information from the user upon startup and grants access to the device after verifying that the entered information is correct. The user can then use the device after clearing authentication using biometric authentication or a password.
[0131] Natural language conversation and emotion recognition
[0132] The device engages in everyday conversations with the user through an artificial intelligence character. This character uses child-friendly voices and animations to ask the user questions such as, "How are you feeling today?" When the user responds, an emotion engine analyzes the response and recognizes the user's emotional state.
[0133] Stress assessment and emotion analysis
[0134] The server receives conversation and emotion data sent from the terminal and analyzes it using natural language processing technology. In addition to assessing stress levels, it analyzes the user's emotions in detail to understand short-term and long-term trends. For example, if emotions such as sadness or anger appear frequently, it will be determined that the user is under high stress.
[0135] Providing notifications and advice
[0136] The server generates notifications for parents based on stress assessment and emotion analysis. These notifications include information about the child frequently exhibiting certain emotions and situation-specific advice. The device then provides the user with adaptive advice such as, "Let's try something to change your mood today."
[0137] Data monitoring
[0138] The server collects data over long periods and uses recorded conversation and sentiment analysis data to meticulously track the user's growth. It also sends regular feedback to parents to help monitor the user's mental well-being.
[0139] This system is designed to comprehensively support children's psychological and emotional health by gaining a three-dimensional understanding of the user's emotional state, enabling parents to intervene appropriately.
[0140] The following describes the processing flow.
[0141] Step 1:
[0142] The device displays a user authentication screen when it starts up and the user logs in. The user enters authentication information using a password or biometric authentication. The device verifies the entered information and grants access to the device if it is correct.
[0143] Step 2:
[0144] The device activates an artificial intelligence character and begins a conversation with the user. This character engages in dialogue through questions such as "How was your day?", facilitating a natural exchange.
[0145] Step 3:
[0146] The user responds to questions from the device using voice. The device receives the voice input and converts it into text data using its automatic speech recognition function. This text data is stored for sentiment analysis.
[0147] Step 4:
[0148] The device sends text data to the server. The server analyzes the received data using an emotion engine to recognize the user's emotional state (e.g., joy, sadness, anger, etc.).
[0149] Step 5:
[0150] The server assesses stress levels based on the emotion analysis results. These assessment results are used as foundational data for continuous notifications to parents. If stress levels are high, additional information may be required.
[0151] Step 6:
[0152] The server generates notifications for parents based on their emotional state and stress levels. These notifications include messages such as, "The user appears to be experiencing significant stress recently," and specific advice.
[0153] Step 7:
[0154] The server sends back notifications and advice generated by the server to the device. The device then displays advice to the user, such as, "Why not go outside for a walk to relax?", and advises appropriate actions based on the situation.
[0155] Step 8:
[0156] The server records and stores conversation data and analysis results of emotional states over long periods. Through this, it monitors the user's emotional tendencies and stress fluctuations, and provides regular feedback to parents as needed.
[0157] (Example 2)
[0158] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0159] In modern society, there is a growing need to appropriately monitor the mental and emotional health of users, especially during childhood, and provide the necessary support. However, conventional technologies have made it difficult to accurately assess users' stress levels and emotional states during this process, and to provide appropriate feedback and advice to parents and the users themselves.
[0160] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0161] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an artificial intelligence character that engages in natural language conversation with the user, and means for analyzing information obtained from the conversation to recognize the emotional state and evaluate the stress level. This makes it possible to comprehensively understand the user's mental and emotional health and enable appropriate intervention and support.
[0162] "User authentication information" refers to the information necessary to identify a user and authorize them to use a device. This includes biometric information and passwords.
[0163] An "artificial intelligence character" refers to a digital character programmed to interact with users through natural language conversation. It uses voice and animation to create a friendly and engaging conversation.
[0164] "Natural language" refers to the language that humans use on a daily basis, expressed in written or conversational forms. Technologies that process this language enable effective communication with users.
[0165] "Recognizing emotional states" refers to the process of analyzing linguistic information emanating from a user's words and actions to identify their emotions.
[0166] "Assessing stress levels" refers to a method of quantifying or evaluating the psychological burden on a user based on the information they provide.
[0167] "Notifications" refer to information and warning messages provided to users or their guardians. This includes information about the user's emotions and behavior.
[0168] "Providing advice" refers to the act of giving guidelines and suggestions tailored to the user's current situation. This encourages improvements in the user's daily behavior.
[0169] "Monitoring" refers to the act of observing specific data or situations over a long period of time to understand their changes and trends. This allows for tracking the user's growth and health status.
[0170] This invention relates to a system that incorporates artificial intelligence into a child-friendly information terminal to monitor and support the user's mental health. This system consistently performs user authentication, emotion recognition, stress assessment, notification generation, and data monitoring.
[0171] The device receives authentication information from the user via biometric authentication or password entry upon startup, and uses this information to identify the user. Once authentication is complete, an artificial intelligence character begins interacting with the user using voice and animation. For example, the character might ask, "How are you doing today?" and prompt the user for a natural language response.
[0172] The user responds to this question with words expressing their feelings, such as "I'm feeling great today." The device converts this voice information into text data and sends it to the emotion engine. The emotion engine analyzes this text using natural language processing techniques to recognize the emotional state. Common natural language processing APIs are used for the analysis, achieving highly accurate emotion recognition.
[0173] The server evaluates stress levels based on data received from the terminal. In this process, it quantifies the type and intensity of emotions, giving concrete details to the user's stress state. Over the long term, as data is collected, the server analyzes this data to understand the user's emotional tendencies and changes in stress levels.
[0174] Furthermore, the server generates notifications for parents, reporting on the effects of specific emotional states and stress. These notifications also include specific advice, such as, "Your child needs a little more rest." The device also offers suggestions to the user that are beneficial to their daily life and promote healthy behaviors.
[0175] As a concrete example, consider a scenario where a user tells an AI character about what happened at school, saying, "I made a new friend today." From this statement, the device recognizes a positive emotion, and the server collects the data and sends a notification to the parent that reinforces the positive trend. In this process, the generative AI model can use a prompt like this: "You are an AI assistant for children. Recognize the user's emotional state through conversation and provide constructive feedback as needed."
[0176] Thus, this system technically realizes health support through emotional assessment and communication.
[0177] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0178] Step 1: User Authentication
[0179] The device prompts the user to enter authentication information when the device starts up. This can be done through facial recognition, fingerprint recognition, or password entry.
[0180] Input: User's biometric data or password.
[0181] Data processing: This process involves comparing the entered authentication data with the information in the internal database.
[0182] Output: Authentication success / failure result. If successful, device use is permitted; if failed, authentication will be requested again.
[0183] Step 2: Initiating a conversation and recognizing emotions
[0184] The device initiates a conversation with a user who has successfully authenticated, using an artificial intelligence character. The character might ask questions such as, "How are you feeling today?"
[0185] Input: Voice response from the user.
[0186] Data processing: Speech recognition is used to convert audio data into text data, and that text is then passed to the emotion engine.
[0187] Output: Text data obtained from the user.
[0188] Step 3: Analysis of emotional state
[0189] The server receives text data sent from the terminal and analyzes it using natural language processing (NLP) techniques.
[0190] Input: Text data.
[0191] Data processing: This involves performing sentiment analysis based on text data to evaluate the type and intensity of emotions.
[0192] Output: Evaluation results of emotional state and stress level.
[0193] Step 4: Stress Assessment and Trend Identification
[0194] The server analyzes short-term and long-term emotional trends based on emotional state and stress levels.
[0195] Input: Assessed emotional state and stress level.
[0196] Data processing: This involves comparing current data with past data to understand current emotional trends.
[0197] Output: Latest emotional trend data and stress assessment results.
[0198] Step 5: Providing notifications and advice
[0199] The server generates notifications for parents based on emotional trend data.
[0200] Input: Latest emotional trend data and stress assessment results.
[0201] Data processing: This process involves designing advice for the user based on this data. For example, it might generate a message like, "Take more rest."
[0202] Output: Notifications for parents and on-device advice displays.
[0203] Step 6: Long-term data monitoring
[0204] The server records and analyzes data on users' emotions and stress levels over a long period of time.
[0205] Input: Accumulated emotional state and stress data.
[0206] Data processing: Perform time-series analysis to evaluate the user's psychological growth.
[0207] Output: Regular report generation and feedback submission.
[0208] (Application Example 2)
[0209] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0210] To understand a child's mental health, it is crucial to continuously monitor their emotions in their daily lives. However, currently, parents can only infer their child's state through communication, which can lead to oversights and misunderstandings. Therefore, the challenge is to create a system that naturally observes a child's emotions within the home environment and provides the necessary support.
[0211] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0212] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an artificial intelligence character to engage in natural language conversation with the user and evaluate the stress level, and means for generating notifications to the user or guardian based on the evaluation results and providing guidance for daily life. This makes it possible to continuously monitor the user's emotional state within the home and provide necessary support.
[0213] "User authentication information" refers to information used to identify users who access the system.
[0214] An "artificial intelligence character" is a computer program designed to interact with users and is capable of conversing in natural language.
[0215] "Stress level" is an indicator that assesses the user's emotional state and shows the degree of tension and fatigue.
[0216] A "notification" is a message or alert sent to the user or parent / guardian based on the evaluated information.
[0217] "Household appliances" refer to electrical and mechanical devices used within a user's living environment and designed to support an individual's daily life.
[0218] A "speech recognition receiving device" is a device used to acquire speech as input data and process it.
[0219] An "image acquisition device" is hardware or software used to capture visual data and to visually capture information.
[0220] "Emotional state" refers to the psychological or emotional state that the user is currently experiencing.
[0221] "Guidelines for conduct" are advice and instructions that indicate appropriate behaviors that users or their guardians should take in their daily lives.
[0222] An "interface" refers to the points of contact or means that enable interaction between a user and a system.
[0223] This invention is a system for monitoring a child's mental health through a home device and providing appropriate support. The system includes the following elements:
[0224] First, the device receives user authentication information and identifies the user. This authentication is performed using biometric authentication or a password. Once authentication is complete, the user can begin a natural language conversation through an artificial intelligence character. This conversation is conducted using a speech recognition receiver and an image acquisition device to accurately capture the user's statements and actions.
[0225] Next, the server uses natural language processing techniques and sentiment analysis engines to analyze the data obtained from the conversation. It determines the emotional state from the content of the conversation and evaluates the stress level. For example, it uses software such as IBM Watson or Google Cloud Natural Language API to analyze emotions in detail.
[0226] Based on the analyzed results, the server generates necessary notifications and advice for parents and sends them through home devices. This is done using email and application notifications. Furthermore, it is used as a guideline to show what actions families should take.
[0227] For example, if a child tells a robot, "I'm feeling a little lonely today," the system can identify that emotion and send a notification to the parent saying, "Your child may be feeling lonely. Try talking to them more."
[0228] An example of a prompt for a generative AI model is the question, "What information did the robot report about the child's mood today?" Using this prompt allows the system to report the child's emotional state more accurately, enabling appropriate follow-up.
[0229] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0230] Step 1:
[0231] The device receives user authentication information to identify the user. Inputs include biometric data and passwords. The device uses these inputs to verify the user's identity and grant access. Specific actions include biometric authentication using fingerprint sensors or cameras, and password matching.
[0232] Step 2:
[0233] After authentication is complete, the device begins a natural language conversation with the user. The input is the user's voice, and the output is that voice data. The device uses a speech recognition receiver to convert the voice into text data. This converted text data is used in the next step. The specific operations include voice input via the microphone and text conversion by speech recognition software.
[0234] Step 3:
[0235] The server receives text data sent from the terminal and performs sentiment analysis using natural language processing techniques. The input is text data, and the output is data representing the emotional state. The server uses a generative AI model to detect the user's emotional state and evaluate the stress level. Specific operations include utilizing an sentiment analysis engine and assigning sentiment labels through pattern recognition.
[0236] Step 4:
[0237] Based on the evaluation results, the server generates a notification for parents. The input is emotion data, and the output is a notification message for parents. The server creates a message that includes actionable guidelines tailored to the emotional state and stress level, and sends it as an email or application notification. Specific operations include template-based message generation and notification transmission via communication protocols.
[0238] Step 5:
[0239] The server provides advice to the user through home devices based on the generated notifications. The input is the notification message, and the output is audio or visual advice to the user. Home devices use speakers and displays to convey information visually or audibly. Specific actions include audio output using speakers and visual display on screens.
[0240] Step 6:
[0241] The server monitors user emotional data over the long term and analyzes changes and trends in behavior. The input is historical emotional data, and the output is a long-term report of the user's psychological state. The server uses the information stored in the database to analyze fluctuations over time and generate feedback. Specific operations include the use of analytical algorithms and the automatic generation of periodic reports.
[0242] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0243] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0244] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0245] [Second Embodiment]
[0246] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0247] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0248] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0249] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0250] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0251] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0252] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0253] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0254] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0255] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0256] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0257] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0258] This invention provides a system for monitoring a child's mental health through interaction with an artificial intelligence character embedded in a communication device for children. This system performs multiple processes, including user authentication, natural language processing, stress assessment, and providing appropriate notifications and advice.
[0259] User Authentication
[0260] The device first requests user authentication information and grants access only if the user is correctly identified. This ensures data security and privacy. For example, biometric authentication can be used to easily verify that the device belongs to the child.
[0261] Natural language conversation
[0262] The device engages in everyday conversations with the user through an artificial intelligence character. These conversations utilize a standard voice interaction interface, making them user-friendly for children. For example, the device might ask the user, "What happened at school today?" and the user might reply, "I had fun today." This response is then used for data analysis in the next step.
[0263] Stress assessment
[0264] The server takes in conversational data sent from the terminal and uses natural language processing techniques to analyze signs of stress. For example, a prolonged use of negative words or tone may indicate a high stress level. Based on this assessment, if a specific trigger is identified, additional questions can be generated and asked to the user.
[0265] Providing notifications and advice
[0266] Based on the stress assessment results, the server generates notifications for parents, providing important information and advice. These notifications are privacy-conscious and may be sent in a format such as, "Your child may have been experiencing stress recently." Furthermore, the device provides users with specific actionable advice via voice or text. For example, it might suggest, "Why not take a short break and play some games today?"
[0267] Data monitoring
[0268] The server records and analyzes user conversation data and stress assessment results over a long period to understand changes in the user's stress levels. Through long-term monitoring, the server provides regular feedback to parents and helps them keep track of their child's growth and development.
[0269] This system aims to comprehensively understand a child's mental health, including their psychological state, and to enable integrated care in collaboration with parents.
[0270] The following describes the processing flow.
[0271] Step 1:
[0272] When the device is powered on, it displays a login screen and prompts the user to enter authentication information. This includes a password or biometric authentication (fingerprint or facial recognition). The user provides the login information, and the device verifies the authentication.
[0273] Step 2:
[0274] After authentication, the device activates an artificial intelligence character. The character greets the child and begins a casual conversation, asking natural questions such as, "How was school today?"
[0275] Step 3:
[0276] The user responds to questions from the device using voice. The user's voice is converted to text by the device, and this data is saved as the content of the conversation.
[0277] Step 4:
[0278] The terminal sends the conversation content to the server, which analyzes the text data using a natural language processing engine. It then infers the stress level from the tone and keywords.
[0279] Step 5:
[0280] Based on the stress estimation result, the server generates additional questions as needed. These questions are sent to the terminal, and the terminal asks the user the additional questions.
[0281] Step 6:
[0282] The terminal sends the additional responses from the user to the server again, and the server completes the stress analysis. The evaluation results are notified to the guardians in a form that protects privacy.
[0283] Step 7:
[0284] The server generates advice based on the causes of the child's stress and sends it to the terminal. The terminal presents the advice to the user in the form of "try taking a little break".
[0285] Step 8:
[0286] The server records the conversation data and stress evaluation results in the long term and provides feedback based on the monitoring results to the guardians regularly. This enables corresponding measures according to the growth of the user.
[0287] (Example 1)
[0288] Next, Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".
[0289] In modern information society, the importance of continuously grasping the mental health status of children is increasing. However, with conventional technologies, it is difficult to appropriately evaluate children's daily emotions and stress levels and provide effective advice and notifications based on them. Therefore, an integrated system for promoting mental health and monitoring children's growth is required.
[0290] The specific processing by the specific processing unit 290 of the data processing device 12 in Example 1 is realized by the following respective means.
[0291] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an information processing device that performs natural language conversation with the user, means for analyzing the record obtained from the conversation and evaluating the mental state, means for generating a notification to the supervisor based on the evaluation result, means for providing the user with behavioral guidelines based on the notification, and means for monitoring the record over the long term. This makes it possible to comprehensively evaluate the mental health state of a child and take appropriate action in cooperation with the supervisor.
[0292] "User authentication information" refers to unique identification information entered to identify a user, and is used to verify their permission to access the system.
[0293] An "information processing device" is a device or program that engages in conversation with a user and analyzes the content of that conversation, and is capable of natural language processing using artificial intelligence.
[0294] "Records" refer to data including the content of conversations with users and the results of their analysis, and are used to evaluate their mental state.
[0295] "Mental state" refers to the user's psychological health and emotional changes, including stress levels and emotional tendencies.
[0296] A "supervisor" is an individual or organization that has the authority to monitor the user's psychological health and intervene or provide support as needed.
[0297] "Action guidelines" are specific actions and advice that users should take to improve or maintain their mental state.
[0298] "Long-term monitoring" refers to the act of collecting and analyzing data over time to understand changes and trends in user behavior.
[0299] This invention is a system for monitoring a user's mental health and taking appropriate action. The system mainly consists of a server and terminals.
[0300] The device is a personal information terminal (PDI) used by users on a daily basis and is equipped with natural language processing technology utilizing artificial intelligence. Users first enter authentication information on the device screen, which includes biometric authentication such as fingerprint and facial recognition. The device also engages in natural language conversations with the user through an AI-powered voice dialogue interface and records the conversation. Specific software applications include speech recognition engines and natural language processing libraries.
[0301] The server receives conversation data sent from the terminal and analyzes its content. This analysis utilizes a generative AI model to assess the user's stress level based on their statements. For example, if the conversation contains many negative expressions such as "tired" or "sad," the server will determine that the user is experiencing high stress. Based on the assessment, the server generates a privacy-conscious notification for the supervisor, stating something like, "Your child may be experiencing stress." It also provides the user with specific actionable advice, such as, "Why not take some time to relax today?" This process allows for the early detection and resolution of problems.
[0302] As a concrete example, by inputting a prompt such as "Please provide examples of positive responses to questions asked by children" into a generative AI model, natural conversations can be generated. In this way, a system that comprehensively manages the user's mental health can be realized.
[0303] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0304] Step 1:
[0305] The terminal receives user authentication information as input from the user. As specific operations, fingerprint authentication and face authentication are performed. Thereby, the identity of the user is identified. When this authentication information is processed by the terminal and the authentication is successful, access permission is output from the terminal to the server.
[0306] Step 2:
[0307] The user starts a conversation using the voice input interface of the terminal. The terminal receives this voice as input data and converts it into text data via a voice recognition engine. This text data is sent from the terminal to the server and serves as a basis for performing natural language processing.
[0308] Step 3:
[0309] The server receives the text data transferred from the terminal as input and performs data processing using a generation AI model. Specifically, using natural language processing technology, it analyzes the user's emotions and stress level. The analysis result is output as numerical data indicating the presence or absence and degree of stress. This output is used for subsequent notification generation.
[0310] Step 4:
[0311] The server generates a notification to the supervisor based on the result of the stress analysis. In this process, a notification message such as "Your child may be feeling stressed" is generated in a privacy - conscious manner. The generated notification message is sent from the server to the supervisor's terminal.
[0312] Step 5:
[0313] The server creates action guidelines for the user and sends them to the terminal. In this process, based on the stress evaluation obtained from the generation AI model, specific advice such as "Why don't you take some time to relax today?" is provided. This information is output to the terminal and presented to the user visually or audibly.
[0314] Step 6:
[0315] The server stores all conversation data and stress analysis results long-term, preparing them for time-series analysis. This step involves accumulating data in a database and performing trend analysis to understand future changes. This creates a data foundation for understanding the evolution of users' mental health.
[0316] (Application Example 1)
[0317] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0318] In recent years, there has been a growing need to continuously monitor children's mental health and provide early detection and appropriate advice. However, traditional methods have presented challenges in real-time assessment of psychological state and providing prompt feedback to parents. Furthermore, the lack of systems that integrate children's conditions into their daily lives has made natural communication with children and stress assessment difficult.
[0319] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0320] In this invention, the server includes means for receiving authentication information and identifying the user; means for configuring an artificial intelligence agent that conducts natural language dialogue with the user; means for analyzing information collected from the dialogue and evaluating the psychological health status; means for creating a notification for the user or caregiver based on the evaluation results; means for providing advice to the user based on the notification; means for continuously monitoring the information; means for analyzing the tone during the dialogue using speech recognition and emotion analysis technology and generating a response; and means for providing feedback to the caregiver via a terminal. This enables real-time evaluation of the child's psychological health status in daily life and the provision of quick and appropriate information to parents.
[0321] "Authentication information" refers to information used by a device to identify a specific user, and includes biometric information and passwords.
[0322] "User" refers to an individual who operates and interacts with the system, and in this invention, it primarily refers to a child.
[0323] "Natural language dialogue" refers to a form of communication that takes place between a user and an artificial intelligence agent through voice or text.
[0324] An "artificial intelligence agent" is a program that performs natural language processing and collects and analyzes information through dialogue with the user.
[0325] "Psychological health" refers to the state of the user's emotions, stress levels, and mental stability.
[0326] A "notification" is a message sent to the user or caregiver based on the system's evaluation.
[0327] "Advice" refers to information that includes specific actions or points to consider, suggested to the user based on the notification.
[0328] "Speech recognition technology" is a technology that converts a user's voice into text data.
[0329] "Emotional analysis technology" is an analytical technique used to identify a user's emotions and tone from collected dialogue data.
[0330] "Feedback" refers to information provided to caregivers through a device, including evaluation results and advice.
[0331] The system in this invention is primarily realized through the interaction of a server, a terminal, and a user. The server receives the user's authentication information and ensures that the user is correctly identified. It is desirable to use biometric authentication technology for this purpose. The terminal engages in natural language dialogue with the user, specifically a child, and an artificial intelligence agent. This dialogue is converted into text information using speech recognition technology and sent to the server.
[0332] The server processes the received conversation data using sentiment analysis technology to assess the user's psychological well-being. Based on this assessment, the server generates notifications and sends them to parents as needed. These notifications may include an assessment of the current situation and specific behavioral advice. The device also utilizes sentiment analysis technology to provide necessary support to the user via voice or text.
[0333] The software used includes Google Speech-to-Text API and IBM Watson Tone Analyzer for speech recognition and sentiment analysis. Firebase is used for the database to store user dialogue data and evaluation results long-term, enabling continuous monitoring.
[0334] For example, if a child is asked "What happened at school today?" while looking at their device and replies "It was fun," that data is sent to the server in real time. If the sentiment analysis evaluates it as "positive," no special notification is generated; only normal feedback is provided.
[0335] An example of a prompt for a generative AI model would be: "Listen to the user about how they're feeling today and what happened at school. If a negative tone is detected, continue asking questions to determine what to inform the parents about."
[0336] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0337] Step 1:
[0338] The user initiates access to the device. The device obtains the user's authentication information using methods such as facial recognition or fingerprint authentication. The input is the user's biometric information, and the output is whether authentication was successful or not. The device uses authentication software and proceeds to the next step if successful.
[0339] Step 2:
[0340] The device uses an artificial intelligence agent to initiate a natural language dialogue with the user. The input is the user's voice, and the output is text data. Speech recognition technology (such as Google Speech-to-Text API) is used to convert the voice data into text.
[0341] Step 3:
[0342] The server receives text data sent from the terminal and evaluates the user's psychological health using sentiment analysis technology. The input is the user's text data, and the output is the sentiment evaluation result. Data analysis is performed using tools such as IBM Watson Tone Analyzer.
[0343] Step 4:
[0344] The server generates notifications based on the results of sentiment analysis. The input is the sentiment evaluation result, and the output is the notification message. If the evaluation reaches a specific trigger, a notification is created for the caregiver.
[0345] Step 5:
[0346] The device provides advice to the user based on the generated notifications. The input is the notification message, and the output is feedback to the user in the form of voice or text. The device uses an AI assistant to read out specific advice.
[0347] Step 6:
[0348] The server stores user text data and sentiment rating results for long-term monitoring. Input is user text data and rating results, and output is storage in a database. A database service such as Firebase is used to record the data.
[0349] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0350] This invention provides a system for monitoring a child's mental health through interaction with the user, by equipping a communication terminal with an artificial intelligence character and an emotion engine. This system performs multiple processes, including user authentication, natural language processing, stress assessment, emotion recognition, and providing appropriate notifications and advice.
[0351] User Authentication
[0352] The device receives authentication information from the user upon startup and grants access to the device after verifying that the entered information is correct. The user can then use the device after clearing authentication using biometric authentication or a password.
[0353] Natural language conversation and emotion recognition
[0354] The device engages in everyday conversations with the user through an artificial intelligence character. This character uses child-friendly voices and animations to ask the user questions such as, "How are you feeling today?" When the user responds, an emotion engine analyzes the response and recognizes the user's emotional state.
[0355] Stress assessment and emotion analysis
[0356] The server receives conversation and emotion data sent from the terminal and analyzes it using natural language processing technology. In addition to assessing stress levels, it analyzes the user's emotions in detail to understand short-term and long-term trends. For example, if emotions such as sadness or anger appear frequently, it will be determined that the user is under high stress.
[0357] Providing notifications and advice
[0358] The server generates notifications for parents based on stress assessment and emotion analysis. These notifications include information about the child frequently exhibiting certain emotions and situation-specific advice. The device then provides the user with adaptive advice such as, "Let's try something to change your mood today."
[0359] Data monitoring
[0360] The server collects data over long periods and uses recorded conversation and sentiment analysis data to meticulously track the user's growth. It also sends regular feedback to parents to help monitor the user's mental well-being.
[0361] This system is designed to comprehensively support children's psychological and emotional health by gaining a three-dimensional understanding of the user's emotional state, enabling parents to intervene appropriately.
[0362] The following describes the processing flow.
[0363] Step 1:
[0364] The device displays a user authentication screen when it starts up and the user logs in. The user enters authentication information using a password or biometric authentication. The device verifies the entered information and grants access to the device if it is correct.
[0365] Step 2:
[0366] The device activates an artificial intelligence character and begins a conversation with the user. This character engages in dialogue through questions such as "How was your day?", facilitating a natural exchange.
[0367] Step 3:
[0368] The user responds to questions from the device using voice. The device receives the voice input and converts it into text data using its automatic speech recognition function. This text data is stored for sentiment analysis.
[0369] Step 4:
[0370] The device sends text data to the server. The server analyzes the received data using an emotion engine to recognize the user's emotional state (e.g., joy, sadness, anger, etc.).
[0371] Step 5:
[0372] The server assesses stress levels based on the emotion analysis results. These assessment results are used as foundational data for continuous notifications to parents. If stress levels are high, additional information may be required.
[0373] Step 6:
[0374] The server generates notifications for parents based on their emotional state and stress levels. These notifications include messages such as, "The user appears to be experiencing significant stress recently," and specific advice.
[0375] Step 7:
[0376] The server sends back notifications and advice generated by the server to the device. The device then displays advice to the user, such as, "Why not go outside for a walk to relax?", and advises appropriate actions based on the situation.
[0377] Step 8:
[0378] The server records and stores conversation data and analysis results of emotional states over long periods. Through this, it monitors the user's emotional tendencies and stress fluctuations, and provides regular feedback to parents as needed.
[0379] (Example 2)
[0380] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0381] In modern society, there is a growing need to appropriately monitor the mental and emotional health of users, especially during childhood, and provide the necessary support. However, conventional technologies have made it difficult to accurately assess users' stress levels and emotional states during this process, and to provide appropriate feedback and advice to parents and the users themselves.
[0382] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0383] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an artificial intelligence character that engages in natural language conversation with the user, and means for analyzing information obtained from the conversation to recognize the emotional state and evaluate the stress level. This makes it possible to comprehensively understand the user's mental and emotional health and enable appropriate intervention and support.
[0384] "User authentication information" refers to the information necessary to identify a user and authorize them to use a device. This includes biometric information and passwords.
[0385] An "artificial intelligence character" refers to a digital character programmed to interact with users through natural language conversation. It uses voice and animation to create a friendly and engaging conversation.
[0386] "Natural language" refers to the language that humans use on a daily basis, expressed in written or conversational forms. Technologies that process this language enable effective communication with users.
[0387] "Recognizing emotional states" refers to the process of analyzing linguistic information emanating from a user's words and actions to identify their emotions.
[0388] "Assessing stress levels" refers to a method of quantifying or evaluating the psychological burden on a user based on the information they provide.
[0389] "Notifications" refer to information and warning messages provided to users or their guardians. This includes information about the user's emotions and behavior.
[0390] "Providing advice" refers to the act of giving guidelines and suggestions tailored to the user's current situation. This encourages improvements in the user's daily behavior.
[0391] "Monitoring" refers to the act of observing specific data or situations over a long period of time to understand their changes and trends. This allows for tracking the user's growth and health status.
[0392] This invention relates to a system that incorporates artificial intelligence into a child-friendly information terminal to monitor and support the user's mental health. This system consistently performs user authentication, emotion recognition, stress assessment, notification generation, and data monitoring.
[0393] The device receives authentication information from the user via biometric authentication or password entry upon startup, and uses this information to identify the user. Once authentication is complete, an artificial intelligence character begins interacting with the user using voice and animation. For example, the character might ask, "How are you doing today?" and prompt the user for a natural language response.
[0394] The user responds to this question with words expressing their feelings, such as "I'm feeling great today." The device converts this voice information into text data and sends it to the emotion engine. The emotion engine analyzes this text using natural language processing techniques to recognize the emotional state. Common natural language processing APIs are used for the analysis, achieving highly accurate emotion recognition.
[0395] The server evaluates stress levels based on data received from the terminal. In this process, it quantifies the type and intensity of emotions, giving concrete details to the user's stress state. Over the long term, as data is collected, the server analyzes this data to understand the user's emotional tendencies and changes in stress levels.
[0396] Furthermore, the server generates notifications for parents, reporting on the effects of specific emotional states and stress. These notifications also include specific advice, such as, "Your child needs a little more rest." The device also offers suggestions to the user that are beneficial to their daily life and promote healthy behaviors.
[0397] As a concrete example, consider a scenario where a user tells an AI character about what happened at school, saying, "I made a new friend today." From this statement, the device recognizes a positive emotion, and the server collects the data and sends a notification to the parent that reinforces the positive trend. In this process, the generative AI model can use a prompt like this: "You are an AI assistant for children. Recognize the user's emotional state through conversation and provide constructive feedback as needed."
[0398] Thus, this system technically realizes health support through emotional assessment and communication.
[0399] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0400] Step 1: User Authentication
[0401] The device prompts the user to enter authentication information when the device starts up. This can be done through facial recognition, fingerprint recognition, or password entry.
[0402] Input: User's biometric data or password.
[0403] Data processing: This process involves comparing the entered authentication data with the information in the internal database.
[0404] Output: Authentication success / failure result. If successful, device use is permitted; if failed, authentication will be requested again.
[0405] Step 2: Initiating a conversation and recognizing emotions
[0406] The device initiates a conversation with a user who has successfully authenticated, using an artificial intelligence character. The character might ask questions such as, "How are you feeling today?"
[0407] Input: Voice response from the user.
[0408] Data processing: Speech recognition is used to convert audio data into text data, and that text is then passed to the emotion engine.
[0409] Output: Text data obtained from the user.
[0410] Step 3: Analysis of emotional state
[0411] The server receives text data sent from the terminal and analyzes it using natural language processing (NLP) techniques.
[0412] Input: Text data.
[0413] Data processing: This involves performing sentiment analysis based on text data to evaluate the type and intensity of emotions.
[0414] Output: Evaluation results of emotional state and stress level.
[0415] Step 4: Stress Assessment and Trend Identification
[0416] The server analyzes short-term and long-term emotional trends based on emotional state and stress levels.
[0417] Input: Assessed emotional state and stress level.
[0418] Data processing: This involves comparing current data with past data to understand current emotional trends.
[0419] Output: Latest emotional trend data and stress assessment results.
[0420] Step 5: Providing notifications and advice
[0421] The server generates notifications for parents based on emotional trend data.
[0422] Input: Latest emotional trend data and stress assessment results.
[0423] Data processing: This process involves designing advice for the user based on this data. For example, it might generate a message like, "Take more rest."
[0424] Output: Notifications for parents and on-device advice displays.
[0425] Step 6: Long-term data monitoring
[0426] The server records and analyzes data on users' emotions and stress levels over a long period of time.
[0427] Input: Accumulated emotional state and stress data.
[0428] Data processing: Perform time-series analysis to evaluate the user's psychological growth.
[0429] Output: Regular report generation and feedback submission.
[0430] (Application Example 2)
[0431] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".
[0432] To understand a child's mental health, it is crucial to continuously monitor their emotions in their daily lives. However, currently, parents can only infer their child's state through communication, which can lead to oversights and misunderstandings. Therefore, the challenge is to create a system that naturally observes a child's emotions within the home environment and provides the necessary support.
[0433] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0434] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an artificial intelligence character to engage in natural language conversation with the user and evaluate the stress level, and means for generating notifications to the user or guardian based on the evaluation results and providing guidance for daily life. This makes it possible to continuously monitor the user's emotional state within the home and provide necessary support.
[0435] "User authentication information" refers to information used to identify users who access the system.
[0436] An "artificial intelligence character" is a computer program designed to interact with users and is capable of conversing in natural language.
[0437] "Stress level" is an indicator that assesses the user's emotional state and shows the degree of tension and fatigue.
[0438] A "notification" is a message or alert sent to the user or parent / guardian based on the evaluated information.
[0439] "Household appliances" refer to electrical and mechanical devices used within a user's living environment and designed to support an individual's daily life.
[0440] A "speech recognition receiving device" is a device used to acquire speech as input data and process it.
[0441] An "image acquisition device" is hardware or software used to capture visual data and to visually capture information.
[0442] "Emotional state" refers to the psychological or emotional state that the user is currently experiencing.
[0443] "Guidelines for conduct" are advice and instructions that indicate appropriate behaviors that users or their guardians should take in their daily lives.
[0444] An "interface" refers to the points of contact or means that enable interaction between a user and a system.
[0445] This invention is a system for monitoring a child's mental health through a home device and providing appropriate support. The system includes the following elements:
[0446] First, the device receives user authentication information and identifies the user. This authentication is performed using biometric authentication or a password. Once authentication is complete, the user can begin a natural language conversation through an artificial intelligence character. This conversation is conducted using a speech recognition receiver and an image acquisition device to accurately capture the user's statements and actions.
[0447] Next, the server uses natural language processing techniques and sentiment analysis engines to analyze the data obtained from the conversation. It determines the emotional state from the content of the conversation and evaluates the stress level. For example, it uses software such as IBM Watson or Google Cloud Natural Language API to analyze emotions in detail.
[0448] Based on the analyzed results, the server generates necessary notifications and advice for parents and sends them through home devices. This is done using email and application notifications. Furthermore, it is used as a guideline to show what actions families should take.
[0449] For example, if a child tells a robot, "I'm feeling a little lonely today," the system can identify that emotion and send a notification to the parent saying, "Your child may be feeling lonely. Try talking to them more."
[0450] An example of a prompt for a generative AI model is the question, "What information did the robot report about the child's mood today?" Using this prompt allows the system to report the child's emotional state more accurately, enabling appropriate follow-up.
[0451] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0452] Step 1:
[0453] The device receives user authentication information to identify the user. Inputs include biometric data and passwords. The device uses these inputs to verify the user's identity and grant access. Specific actions include biometric authentication using fingerprint sensors or cameras, and password matching.
[0454] Step 2:
[0455] After authentication is complete, the device begins a natural language conversation with the user. The input is the user's voice, and the output is that voice data. The device uses a speech recognition receiver to convert the voice into text data. This converted text data is used in the next step. The specific operations include voice input via the microphone and text conversion by speech recognition software.
[0456] Step 3:
[0457] The server receives text data sent from the terminal and performs sentiment analysis using natural language processing techniques. The input is text data, and the output is data representing the emotional state. The server uses a generative AI model to detect the user's emotional state and evaluate the stress level. Specific operations include utilizing an sentiment analysis engine and assigning sentiment labels through pattern recognition.
[0458] Step 4:
[0459] Based on the evaluation results, the server generates a notification for parents. The input is emotion data, and the output is a notification message for parents. The server creates a message that includes actionable guidelines tailored to the emotional state and stress level, and sends it as an email or application notification. Specific operations include template-based message generation and notification transmission via communication protocols.
[0460] Step 5:
[0461] The server provides advice to the user through home devices based on the generated notifications. The input is the notification message, and the output is audio or visual advice to the user. Home devices use speakers and displays to convey information visually or audibly. Specific actions include audio output using speakers and visual display on screens.
[0462] Step 6:
[0463] The server monitors user emotional data over the long term and analyzes changes and trends in behavior. The input is historical emotional data, and the output is a long-term report of the user's psychological state. The server uses the information stored in the database to analyze fluctuations over time and generate feedback. Specific operations include the use of analytical algorithms and the automatic generation of periodic reports.
[0464] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0465] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0466] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0467] [Third Embodiment]
[0468] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0469] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0470] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0471] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0472] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0473] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0474] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0475] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0476] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0477] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0478] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0479] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0480] This invention provides a system for monitoring a child's mental health through interaction with an artificial intelligence character embedded in a communication device for children. This system performs multiple processes, including user authentication, natural language processing, stress assessment, and providing appropriate notifications and advice.
[0481] User Authentication
[0482] The device first requests user authentication information and grants access only if the user is correctly identified. This ensures data security and privacy. For example, biometric authentication can be used to easily verify that the device belongs to the child.
[0483] Natural language conversation
[0484] The device engages in everyday conversations with the user through an artificial intelligence character. These conversations utilize a standard voice interaction interface, making them user-friendly for children. For example, the device might ask the user, "What happened at school today?" and the user might reply, "I had fun today." This response is then used for data analysis in the next step.
[0485] Stress assessment
[0486] The server takes in conversational data sent from the terminal and uses natural language processing techniques to analyze signs of stress. For example, a prolonged use of negative words or tone may indicate a high stress level. Based on this assessment, if a specific trigger is identified, additional questions can be generated and asked to the user.
[0487] Providing notifications and advice
[0488] Based on the stress assessment results, the server generates notifications for parents, providing important information and advice. These notifications are privacy-conscious and may be sent in a format such as, "Your child may have been experiencing stress recently." Furthermore, the device provides users with specific actionable advice via voice or text. For example, it might suggest, "Why not take a short break and play some games today?"
[0489] Data monitoring
[0490] The server records and analyzes user conversation data and stress assessment results over a long period to understand changes in the user's stress levels. Through long-term monitoring, the server provides regular feedback to parents and helps them keep track of their child's growth and development.
[0491] This system aims to comprehensively understand a child's mental health, including their psychological state, and to enable integrated care in collaboration with parents.
[0492] The following describes the processing flow.
[0493] Step 1:
[0494] When the device is powered on, it displays a login screen and prompts the user to enter authentication information. This includes a password or biometric authentication (fingerprint or facial recognition). The user provides the login information, and the device verifies the authentication.
[0495] Step 2:
[0496] After authentication, the device activates an artificial intelligence character. The character greets the child and begins a casual conversation, asking natural questions such as, "How was school today?"
[0497] Step 3:
[0498] The user responds to questions from the device using voice. The user's voice is converted to text by the device, and this data is saved as the content of the conversation.
[0499] Step 4:
[0500] The terminal sends the conversation content to the server, which analyzes the text data using a natural language processing engine. It then infers the stress level from the tone and keywords.
[0501] Step 5:
[0502] Based on the stress level estimation, the server generates additional questions as needed. These questions are sent to the terminal, which then asks the user further questions.
[0503] Step 6:
[0504] The device sends additional responses from the user back to the server, and the server completes the stress analysis. The evaluation results are notified to the parent in a privacy-protected format.
[0505] Step 7:
[0506] The server generates advice based on the child's stressors and sends it to the device. The device then presents the advice to the user, such as "Try taking a short break."
[0507] Step 8:
[0508] The server records conversation data and stress assessment results over the long term and provides regular feedback to parents based on the monitoring results. This allows for responses tailored to the user's growth.
[0509] (Example 1)
[0510] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0511] In today's information society, the importance of continuously monitoring children's mental health is increasing. However, conventional technologies struggle to adequately assess children's daily emotions and stress levels, and to provide effective advice and notifications based on that assessment. Therefore, an integrated system is needed to promote mental health and monitor children's development.
[0512] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0513] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an information processing device that performs natural language conversation with the user, means for analyzing the record obtained from the conversation and evaluating the mental state, means for generating a notification to the supervisor based on the evaluation result, means for providing the user with behavioral guidelines based on the notification, and means for monitoring the record over the long term. This makes it possible to comprehensively evaluate the mental health state of a child and take appropriate action in cooperation with the supervisor.
[0514] "User authentication information" refers to unique identification information entered to identify a user, and is used to verify their permission to access the system.
[0515] An "information processing device" is a device or program that engages in conversation with a user and analyzes the content of that conversation, and is capable of natural language processing using artificial intelligence.
[0516] "Records" refer to data including the content of conversations with users and the results of their analysis, and are used to evaluate their mental state.
[0517] "Mental state" refers to the user's psychological health and emotional changes, including stress levels and emotional tendencies.
[0518] A "supervisor" is an individual or organization that has the authority to monitor the user's psychological health and intervene or provide support as needed.
[0519] "Action guidelines" are specific actions and advice that users should take to improve or maintain their mental state.
[0520] "Long-term monitoring" refers to the act of collecting and analyzing data over time to understand changes and trends in user behavior.
[0521] This invention is a system for monitoring a user's mental health and taking appropriate action. The system mainly consists of a server and terminals.
[0522] The device is a personal information terminal (PDI) used by users on a daily basis and is equipped with natural language processing technology utilizing artificial intelligence. Users first enter authentication information on the device screen, which includes biometric authentication such as fingerprint and facial recognition. The device also engages in natural language conversations with the user through an AI-powered voice dialogue interface and records the conversation. Specific software applications include speech recognition engines and natural language processing libraries.
[0523] The server receives conversation data sent from the terminal and analyzes its content. This analysis utilizes a generative AI model to assess the user's stress level based on their statements. For example, if the conversation contains many negative expressions such as "tired" or "sad," the server will determine that the user is experiencing high stress. Based on the assessment, the server generates a privacy-conscious notification for the supervisor, stating something like, "Your child may be experiencing stress." It also provides the user with specific actionable advice, such as, "Why not take some time to relax today?" This process allows for the early detection and resolution of problems.
[0524] As a concrete example, by inputting a prompt such as "Please provide examples of positive responses to questions asked by children" into a generative AI model, natural conversations can be generated. In this way, a system that comprehensively manages the user's mental health can be realized.
[0525] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0526] Step 1:
[0527] The terminal receives user authentication information from the user as input. Specifically, fingerprint authentication or facial recognition is performed to identify the user's identity. This authentication information is processed by the terminal, and if authentication is successful, the terminal outputs permission to the server.
[0528] Step 2:
[0529] The user initiates a conversation using the device's voice input interface. The device receives this voice as input data and converts it into text data via a speech recognition engine. This text data is sent from the device to the server, forming the basis for natural language processing.
[0530] Step 3:
[0531] The server receives text data transferred from the terminal as input and processes it using a generative AI model. Specifically, it uses natural language processing techniques to analyze the user's emotions and stress levels. The analysis results are output as numerical data indicating the presence and degree of stress. This output is used to generate subsequent notifications.
[0532] Step 4:
[0533] The server generates a notification for the supervisor based on the stress analysis results. This process generates a notification message, such as "Your child may be experiencing stress," in a privacy-conscious manner. The generated notification message is sent from the server to the supervisor's device.
[0534] Step 5:
[0535] The server creates actionable guidelines for the user and sends them to the device. This process provides specific advice, such as "Why not take some time to relax today?", based on stress assessments obtained from a generated AI model. This information is output to the device and presented to the user visually or audibly.
[0536] Step 6:
[0537] The server stores all conversation data and stress analysis results long-term, preparing them for time-series analysis. This step involves accumulating data in a database and performing trend analysis to understand future changes. This creates a data foundation for understanding the evolution of users' mental health.
[0538] (Application Example 1)
[0539] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0540] In recent years, there has been a growing need to continuously monitor children's mental health and provide early detection and appropriate advice. However, traditional methods have presented challenges in real-time assessment of psychological state and providing prompt feedback to parents. Furthermore, the lack of systems that integrate children's conditions into their daily lives has made natural communication with children and stress assessment difficult.
[0541] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0542] In this invention, the server includes means for receiving authentication information and identifying the user; means for configuring an artificial intelligence agent that conducts natural language dialogue with the user; means for analyzing information collected from the dialogue and evaluating the psychological health status; means for creating a notification for the user or caregiver based on the evaluation results; means for providing advice to the user based on the notification; means for continuously monitoring the information; means for analyzing the tone during the dialogue using speech recognition and emotion analysis technology and generating a response; and means for providing feedback to the caregiver via a terminal. This enables real-time evaluation of the child's psychological health status in daily life and the provision of quick and appropriate information to parents.
[0543] "Authentication information" refers to information used by a device to identify a specific user, and includes biometric information and passwords.
[0544] "User" refers to an individual who operates and interacts with the system, and in this invention, it primarily refers to a child.
[0545] "Natural language dialogue" refers to a form of communication that takes place between a user and an artificial intelligence agent through voice or text.
[0546] An "artificial intelligence agent" is a program that performs natural language processing and collects and analyzes information through dialogue with the user.
[0547] "Psychological health" refers to the state of the user's emotions, stress levels, and mental stability.
[0548] A "notification" is a message sent to the user or caregiver based on the system's evaluation.
[0549] "Advice" refers to information that includes specific actions or points to consider, suggested to the user based on the notification.
[0550] "Speech recognition technology" is a technology that converts a user's voice into text data.
[0551] "Emotional analysis technology" is an analytical technique used to identify a user's emotions and tone from collected dialogue data.
[0552] "Feedback" refers to information provided to caregivers through a device, including evaluation results and advice.
[0553] The system in this invention is primarily realized through the interaction of a server, a terminal, and a user. The server receives the user's authentication information and ensures that the user is correctly identified. It is desirable to use biometric authentication technology for this purpose. The terminal engages in natural language dialogue with the user, specifically a child, and an artificial intelligence agent. This dialogue is converted into text information using speech recognition technology and sent to the server.
[0554] The server processes the received conversation data using sentiment analysis technology to assess the user's psychological well-being. Based on this assessment, the server generates notifications and sends them to parents as needed. These notifications may include an assessment of the current situation and specific behavioral advice. The device also utilizes sentiment analysis technology to provide necessary support to the user via voice or text.
[0555] The software used includes Google Speech-to-Text API and IBM Watson Tone Analyzer for speech recognition and sentiment analysis. Firebase is used for the database to store user dialogue data and evaluation results long-term, enabling continuous monitoring.
[0556] For example, if a child is asked "What happened at school today?" while looking at their device and replies "It was fun," that data is sent to the server in real time. If the sentiment analysis evaluates it as "positive," no special notification is generated; only normal feedback is provided.
[0557] An example of a prompt for a generative AI model would be: "Listen to the user about how they're feeling today and what happened at school. If a negative tone is detected, continue asking questions to determine what to inform the parents about."
[0558] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0559] Step 1:
[0560] The user initiates access to the device. The device obtains the user's authentication information using methods such as facial recognition or fingerprint authentication. The input is the user's biometric information, and the output is whether authentication was successful or not. The device uses authentication software and proceeds to the next step if successful.
[0561] Step 2:
[0562] The device uses an artificial intelligence agent to initiate a natural language dialogue with the user. The input is the user's voice, and the output is text data. Speech recognition technology (such as Google Speech-to-Text API) is used to convert the voice data into text.
[0563] Step 3:
[0564] The server receives text data sent from the terminal and evaluates the user's psychological health using sentiment analysis technology. The input is the user's text data, and the output is the sentiment evaluation result. Data analysis is performed using tools such as IBM Watson Tone Analyzer.
[0565] Step 4:
[0566] The server generates notifications based on the results of sentiment analysis. The input is the sentiment evaluation result, and the output is the notification message. If the evaluation reaches a specific trigger, a notification is created for the caregiver.
[0567] Step 5:
[0568] The device provides advice to the user based on the generated notifications. The input is the notification message, and the output is feedback to the user in the form of voice or text. The device uses an AI assistant to read out specific advice.
[0569] Step 6:
[0570] The server stores user text data and sentiment rating results for long-term monitoring. Input is user text data and rating results, and output is storage in a database. A database service such as Firebase is used to record the data.
[0571] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0572] This invention provides a system for monitoring a child's mental health through interaction with the user, by equipping a communication terminal with an artificial intelligence character and an emotion engine. This system performs multiple processes, including user authentication, natural language processing, stress assessment, emotion recognition, and providing appropriate notifications and advice.
[0573] User Authentication
[0574] The device receives authentication information from the user upon startup and grants access to the device after verifying that the entered information is correct. The user can then use the device after clearing authentication using biometric authentication or a password.
[0575] Natural language conversation and emotion recognition
[0576] The device engages in everyday conversations with the user through an artificial intelligence character. This character uses child-friendly voices and animations to ask the user questions such as, "How are you feeling today?" When the user responds, an emotion engine analyzes the response and recognizes the user's emotional state.
[0577] Stress assessment and emotion analysis
[0578] The server receives conversation and emotion data sent from the terminal and analyzes it using natural language processing technology. In addition to assessing stress levels, it analyzes the user's emotions in detail to understand short-term and long-term trends. For example, if emotions such as sadness or anger appear frequently, it will be determined that the user is under high stress.
[0579] Providing notifications and advice
[0580] The server generates notifications for parents based on stress assessment and emotion analysis. These notifications include information about the child frequently exhibiting certain emotions and situation-specific advice. The device then provides the user with adaptive advice such as, "Let's try something to change your mood today."
[0581] Data monitoring
[0582] The server collects data over long periods and uses recorded conversation and sentiment analysis data to meticulously track the user's growth. It also sends regular feedback to parents to help monitor the user's mental well-being.
[0583] This system is designed to comprehensively support children's psychological and emotional health by gaining a three-dimensional understanding of the user's emotional state, enabling parents to intervene appropriately.
[0584] The following describes the processing flow.
[0585] Step 1:
[0586] The device displays a user authentication screen when it starts up and the user logs in. The user enters authentication information using a password or biometric authentication. The device verifies the entered information and grants access to the device if it is correct.
[0587] Step 2:
[0588] The device activates an artificial intelligence character and begins a conversation with the user. This character engages in dialogue through questions such as "How was your day?", facilitating a natural exchange.
[0589] Step 3:
[0590] The user responds to questions from the device using voice. The device receives the voice input and converts it into text data using its automatic speech recognition function. This text data is stored for sentiment analysis.
[0591] Step 4:
[0592] The device sends text data to the server. The server analyzes the received data using an emotion engine to recognize the user's emotional state (e.g., joy, sadness, anger, etc.).
[0593] Step 5:
[0594] The server assesses stress levels based on the emotion analysis results. These assessment results are used as foundational data for continuous notifications to parents. If stress levels are high, additional information may be required.
[0595] Step 6:
[0596] The server generates notifications for parents based on their emotional state and stress levels. These notifications include messages such as, "The user appears to be experiencing significant stress recently," and specific advice.
[0597] Step 7:
[0598] The server sends back notifications and advice generated by the server to the device. The device then displays advice to the user, such as, "Why not go outside for a walk to relax?", and advises appropriate actions based on the situation.
[0599] Step 8:
[0600] The server records and stores conversation data and analysis results of emotional states over long periods. Through this, it monitors the user's emotional tendencies and stress fluctuations, and provides regular feedback to parents as needed.
[0601] (Example 2)
[0602] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0603] In modern society, there is a growing need to appropriately monitor the mental and emotional health of users, especially during childhood, and provide the necessary support. However, conventional technologies have made it difficult to accurately assess users' stress levels and emotional states during this process, and to provide appropriate feedback and advice to parents and the users themselves.
[0604] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0605] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an artificial intelligence character that engages in natural language conversation with the user, and means for analyzing information obtained from the conversation to recognize the emotional state and evaluate the stress level. This makes it possible to comprehensively understand the user's mental and emotional health and enable appropriate intervention and support.
[0606] "User authentication information" refers to the information necessary to identify a user and authorize them to use a device. This includes biometric information and passwords.
[0607] An "artificial intelligence character" refers to a digital character programmed to interact with users through natural language conversation. It uses voice and animation to create a friendly and engaging conversation.
[0608] "Natural language" refers to the language that humans use on a daily basis, expressed in written or conversational forms. Technologies that process this language enable effective communication with users.
[0609] "Recognizing emotional states" refers to the process of analyzing linguistic information emanating from a user's words and actions to identify their emotions.
[0610] "Assessing stress levels" refers to a method of quantifying or evaluating the psychological burden on a user based on the information they provide.
[0611] "Notifications" refer to information and warning messages provided to users or their guardians. This includes information about the user's emotions and behavior.
[0612] "Providing advice" refers to the act of giving guidelines and suggestions tailored to the user's current situation. This encourages improvements in the user's daily behavior.
[0613] "Monitoring" refers to the act of observing specific data or situations over a long period of time to understand their changes and trends. This allows for tracking the user's growth and health status.
[0614] This invention relates to a system that incorporates artificial intelligence into a child-friendly information terminal to monitor and support the user's mental health. This system consistently performs user authentication, emotion recognition, stress assessment, notification generation, and data monitoring.
[0615] The device receives authentication information from the user via biometric authentication or password entry upon startup, and uses this information to identify the user. Once authentication is complete, an artificial intelligence character begins interacting with the user using voice and animation. For example, the character might ask, "How are you doing today?" and prompt the user for a natural language response.
[0616] The user responds to this question with words expressing their feelings, such as "I'm feeling great today." The device converts this voice information into text data and sends it to the emotion engine. The emotion engine analyzes this text using natural language processing techniques to recognize the emotional state. Common natural language processing APIs are used for the analysis, achieving highly accurate emotion recognition.
[0617] The server evaluates stress levels based on data received from the terminal. In this process, it quantifies the type and intensity of emotions, giving concrete details to the user's stress state. Over the long term, as data is collected, the server analyzes this data to understand the user's emotional tendencies and changes in stress levels.
[0618] Furthermore, the server generates notifications for parents, reporting on the effects of specific emotional states and stress. These notifications also include specific advice, such as, "Your child needs a little more rest." The device also offers suggestions to the user that are beneficial to their daily life and promote healthy behaviors.
[0619] As a concrete example, consider a scenario where a user tells an AI character about what happened at school, saying, "I made a new friend today." From this statement, the device recognizes a positive emotion, and the server collects the data and sends a notification to the parent that reinforces the positive trend. In this process, the generative AI model can use a prompt like this: "You are an AI assistant for children. Recognize the user's emotional state through conversation and provide constructive feedback as needed."
[0620] Thus, this system technically realizes health support through emotional assessment and communication.
[0621] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0622] Step 1: User Authentication
[0623] The device prompts the user to enter authentication information when the device starts up. This can be done through facial recognition, fingerprint recognition, or password entry.
[0624] Input: User's biometric data or password.
[0625] Data processing: This process involves comparing the entered authentication data with the information in the internal database.
[0626] Output: Authentication success / failure result. If successful, device use is permitted; if failed, authentication will be requested again.
[0627] Step 2: Initiating a conversation and recognizing emotions
[0628] The device initiates a conversation with a user who has successfully authenticated, using an artificial intelligence character. The character might ask questions such as, "How are you feeling today?"
[0629] Input: Voice response from the user.
[0630] Data processing: Speech recognition is used to convert audio data into text data, and that text is then passed to the emotion engine.
[0631] Output: Text data obtained from the user.
[0632] Step 3: Analysis of emotional state
[0633] The server receives text data sent from the terminal and analyzes it using natural language processing (NLP) techniques.
[0634] Input: Text data.
[0635] Data processing: This involves performing sentiment analysis based on text data to evaluate the type and intensity of emotions.
[0636] Output: Evaluation results of emotional state and stress level.
[0637] Step 4: Stress Assessment and Trend Identification
[0638] The server analyzes short-term and long-term emotional trends based on emotional state and stress levels.
[0639] Input: Assessed emotional state and stress level.
[0640] Data processing: This involves comparing current data with past data to understand current emotional trends.
[0641] Output: Latest emotional trend data and stress assessment results.
[0642] Step 5: Providing notifications and advice
[0643] The server generates notifications for parents based on emotional trend data.
[0644] Input: Latest emotional trend data and stress assessment results.
[0645] Data processing: This process involves designing advice for the user based on this data. For example, it might generate a message like, "Take more rest."
[0646] Output: Notifications for parents and on-device advice displays.
[0647] Step 6: Long-term data monitoring
[0648] The server records and analyzes data on users' emotions and stress levels over a long period of time.
[0649] Input: Accumulated emotional state and stress data.
[0650] Data processing: Perform time-series analysis to evaluate the user's psychological growth.
[0651] Output: Regular report generation and feedback submission.
[0652] (Application Example 2)
[0653] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0654] To understand a child's mental health, it is crucial to continuously monitor their emotions in their daily lives. However, currently, parents can only infer their child's state through communication, which can lead to oversights and misunderstandings. Therefore, the challenge is to create a system that naturally observes a child's emotions within the home environment and provides the necessary support.
[0655] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0656] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an artificial intelligence character to engage in natural language conversation with the user and evaluate the stress level, and means for generating notifications to the user or guardian based on the evaluation results and providing guidance for daily life. This makes it possible to continuously monitor the user's emotional state within the home and provide necessary support.
[0657] "User authentication information" refers to information used to identify users who access the system.
[0658] An "artificial intelligence character" is a computer program designed to interact with users and is capable of conversing in natural language.
[0659] "Stress level" is an indicator that assesses the user's emotional state and shows the degree of tension and fatigue.
[0660] A "notification" is a message or alert sent to the user or parent / guardian based on the evaluated information.
[0661] "Household appliances" refer to electrical and mechanical devices used within a user's living environment and designed to support an individual's daily life.
[0662] A "speech recognition receiving device" is a device used to acquire speech as input data and process it.
[0663] An "image acquisition device" is hardware or software used to capture visual data and to visually capture information.
[0664] "Emotional state" refers to the psychological or emotional state that the user is currently experiencing.
[0665] "Guidelines for conduct" are advice and instructions that indicate appropriate behaviors that users or their guardians should take in their daily lives.
[0666] An "interface" refers to the points of contact or means that enable interaction between a user and a system.
[0667] This invention is a system for monitoring a child's mental health through a home device and providing appropriate support. The system includes the following elements:
[0668] First, the device receives user authentication information and identifies the user. This authentication is performed using biometric authentication or a password. Once authentication is complete, the user can begin a natural language conversation through an artificial intelligence character. This conversation is conducted using a speech recognition receiver and an image acquisition device to accurately capture the user's statements and actions.
[0669] Next, the server uses natural language processing techniques and sentiment analysis engines to analyze the data obtained from the conversation. It determines the emotional state from the content of the conversation and evaluates the stress level. For example, it uses software such as IBM Watson or Google Cloud Natural Language API to analyze emotions in detail.
[0670] Based on the analyzed results, the server generates necessary notifications and advice for parents and sends them through home devices. This is done using email and application notifications. Furthermore, it is used as a guideline to show what actions families should take.
[0671] For example, if a child tells a robot, "I'm feeling a little lonely today," the system can identify that emotion and send a notification to the parent saying, "Your child may be feeling lonely. Try talking to them more."
[0672] An example of a prompt for a generative AI model is the question, "What information did the robot report about the child's mood today?" Using this prompt allows the system to report the child's emotional state more accurately, enabling appropriate follow-up.
[0673] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0674] Step 1:
[0675] The device receives user authentication information to identify the user. Inputs include biometric data and passwords. The device uses these inputs to verify the user's identity and grant access. Specific actions include biometric authentication using fingerprint sensors or cameras, and password matching.
[0676] Step 2:
[0677] After authentication is complete, the device begins a natural language conversation with the user. The input is the user's voice, and the output is that voice data. The device uses a speech recognition receiver to convert the voice into text data. This converted text data is used in the next step. The specific operations include voice input via the microphone and text conversion by speech recognition software.
[0678] Step 3:
[0679] The server receives text data sent from the terminal and performs sentiment analysis using natural language processing techniques. The input is text data, and the output is data representing the emotional state. The server uses a generative AI model to detect the user's emotional state and evaluate the stress level. Specific operations include utilizing an sentiment analysis engine and assigning sentiment labels through pattern recognition.
[0680] Step 4:
[0681] Based on the evaluation results, the server generates a notification for parents. The input is emotion data, and the output is a notification message for parents. The server creates a message that includes actionable guidelines tailored to the emotional state and stress level, and sends it as an email or application notification. Specific operations include template-based message generation and notification transmission via communication protocols.
[0682] Step 5:
[0683] The server provides advice to the user through home devices based on the generated notifications. The input is the notification message, and the output is audio or visual advice to the user. Home devices use speakers and displays to convey information visually or audibly. Specific actions include audio output using speakers and visual display on screens.
[0684] Step 6:
[0685] The server monitors user emotional data over the long term and analyzes changes and trends in behavior. The input is historical emotional data, and the output is a long-term report of the user's psychological state. The server uses the information stored in the database to analyze fluctuations over time and generate feedback. Specific operations include the use of analytical algorithms and the automatic generation of periodic reports.
[0686] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0687] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0688] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0689] [Fourth Embodiment]
[0690] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0691] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0692] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0693] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0694] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0695] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0696] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0697] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0698] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0699] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0700] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0701] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0702] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0703] This invention provides a system for monitoring a child's mental health through interaction with an artificial intelligence character embedded in a communication device for children. This system performs multiple processes, including user authentication, natural language processing, stress assessment, and providing appropriate notifications and advice.
[0704] User Authentication
[0705] The device first requests user authentication information and grants access only if the user is correctly identified. This ensures data security and privacy. For example, biometric authentication can be used to easily verify that the device belongs to the child.
[0706] Natural language conversation
[0707] The device engages in everyday conversations with the user through an artificial intelligence character. These conversations utilize a standard voice interaction interface, making them user-friendly for children. For example, the device might ask the user, "What happened at school today?" and the user might reply, "I had fun today." This response is then used for data analysis in the next step.
[0708] Stress assessment
[0709] The server takes in conversational data sent from the terminal and uses natural language processing techniques to analyze signs of stress. For example, a prolonged use of negative words or tone may indicate a high stress level. Based on this assessment, if a specific trigger is identified, additional questions can be generated and asked to the user.
[0710] Providing notifications and advice
[0711] Based on the stress assessment results, the server generates notifications for parents, providing important information and advice. These notifications are privacy-conscious and may be sent in a format such as, "Your child may have been experiencing stress recently." Furthermore, the device provides users with specific actionable advice via voice or text. For example, it might suggest, "Why not take a short break and play some games today?"
[0712] Data monitoring
[0713] The server records and analyzes user conversation data and stress assessment results over a long period to understand changes in the user's stress levels. Through long-term monitoring, the server provides regular feedback to parents and helps them keep track of their child's growth and development.
[0714] This system aims to comprehensively understand a child's mental health, including their psychological state, and to enable integrated care in collaboration with parents.
[0715] The following describes the processing flow.
[0716] Step 1:
[0717] When the device is powered on, it displays a login screen and prompts the user to enter authentication information. This includes a password or biometric authentication (fingerprint or facial recognition). The user provides the login information, and the device verifies the authentication.
[0718] Step 2:
[0719] After authentication, the device activates an artificial intelligence character. The character greets the child and begins a casual conversation, asking natural questions such as, "How was school today?"
[0720] Step 3:
[0721] The user responds to questions from the device using voice. The user's voice is converted to text by the device, and this data is saved as the content of the conversation.
[0722] Step 4:
[0723] The terminal sends the conversation content to the server, which analyzes the text data using a natural language processing engine. It then infers the stress level from the tone and keywords.
[0724] Step 5:
[0725] Based on the stress level estimation, the server generates additional questions as needed. These questions are sent to the terminal, which then asks the user further questions.
[0726] Step 6:
[0727] The device sends additional responses from the user back to the server, and the server completes the stress analysis. The evaluation results are notified to the parent in a privacy-protected format.
[0728] Step 7:
[0729] The server generates advice based on the child's stressors and sends it to the device. The device then presents the advice to the user, such as "Try taking a short break."
[0730] Step 8:
[0731] The server records conversation data and stress assessment results over the long term and provides regular feedback to parents based on the monitoring results. This allows for responses tailored to the user's growth.
[0732] (Example 1)
[0733] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0734] In today's information society, the importance of continuously monitoring children's mental health is increasing. However, conventional technologies struggle to adequately assess children's daily emotions and stress levels, and to provide effective advice and notifications based on that assessment. Therefore, an integrated system is needed to promote mental health and monitor children's development.
[0735] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0736] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an information processing device that performs natural language conversation with the user, means for analyzing the record obtained from the conversation and evaluating the mental state, means for generating a notification to the supervisor based on the evaluation result, means for providing the user with behavioral guidelines based on the notification, and means for monitoring the record over the long term. This makes it possible to comprehensively evaluate the mental health state of a child and take appropriate action in cooperation with the supervisor.
[0737] "User authentication information" refers to unique identification information entered to identify a user, and is used to verify their permission to access the system.
[0738] An "information processing device" is a device or program that engages in conversation with a user and analyzes the content of that conversation, and is capable of natural language processing using artificial intelligence.
[0739] "Records" refer to data including the content of conversations with users and the results of their analysis, and are used to evaluate their mental state.
[0740] "Mental state" refers to the user's psychological health and emotional changes, including stress levels and emotional tendencies.
[0741] A "supervisor" is an individual or organization that has the authority to monitor the user's psychological health and intervene or provide support as needed.
[0742] "Action guidelines" are specific actions and advice that users should take to improve or maintain their mental state.
[0743] "Long-term monitoring" refers to the act of collecting and analyzing data over time to understand changes and trends in user behavior.
[0744] This invention is a system for monitoring a user's mental health and taking appropriate action. The system mainly consists of a server and terminals.
[0745] The device is a personal information terminal (PDI) used by users on a daily basis and is equipped with natural language processing technology utilizing artificial intelligence. Users first enter authentication information on the device screen, which includes biometric authentication such as fingerprint and facial recognition. The device also engages in natural language conversations with the user through an AI-powered voice dialogue interface and records the conversation. Specific software applications include speech recognition engines and natural language processing libraries.
[0746] The server receives conversation data sent from the terminal and analyzes its content. This analysis utilizes a generative AI model to assess the user's stress level based on their statements. For example, if the conversation contains many negative expressions such as "tired" or "sad," the server will determine that the user is experiencing high stress. Based on the assessment, the server generates a privacy-conscious notification for the supervisor, stating something like, "Your child may be experiencing stress." It also provides the user with specific actionable advice, such as, "Why not take some time to relax today?" This process allows for the early detection and resolution of problems.
[0747] As a concrete example, by inputting a prompt such as "Please provide examples of positive responses to questions asked by children" into a generative AI model, natural conversations can be generated. In this way, a system that comprehensively manages the user's mental health can be realized.
[0748] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0749] Step 1:
[0750] The terminal receives user authentication information from the user as input. Specifically, fingerprint authentication or facial recognition is performed to identify the user's identity. This authentication information is processed by the terminal, and if authentication is successful, the terminal outputs permission to the server.
[0751] Step 2:
[0752] The user initiates a conversation using the device's voice input interface. The device receives this voice as input data and converts it into text data via a speech recognition engine. This text data is sent from the device to the server, forming the basis for natural language processing.
[0753] Step 3:
[0754] The server receives text data transferred from the terminal as input and processes it using a generative AI model. Specifically, it uses natural language processing techniques to analyze the user's emotions and stress levels. The analysis results are output as numerical data indicating the presence and degree of stress. This output is used to generate subsequent notifications.
[0755] Step 4:
[0756] The server generates a notification for the supervisor based on the stress analysis results. This process generates a notification message, such as "Your child may be experiencing stress," in a privacy-conscious manner. The generated notification message is sent from the server to the supervisor's device.
[0757] Step 5:
[0758] The server creates actionable guidelines for the user and sends them to the device. This process provides specific advice, such as "Why not take some time to relax today?", based on stress assessments obtained from a generated AI model. This information is output to the device and presented to the user visually or audibly.
[0759] Step 6:
[0760] The server stores all conversation data and stress analysis results long-term, preparing them for time-series analysis. This step involves accumulating data in a database and performing trend analysis to understand future changes. This creates a data foundation for understanding the evolution of users' mental health.
[0761] (Application Example 1)
[0762] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0763] In recent years, there has been a growing need to continuously monitor children's mental health and provide early detection and appropriate advice. However, traditional methods have presented challenges in real-time assessment of psychological state and providing prompt feedback to parents. Furthermore, the lack of systems that integrate children's conditions into their daily lives has made natural communication with children and stress assessment difficult.
[0764] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0765] In this invention, the server includes means for receiving authentication information and identifying the user; means for configuring an artificial intelligence agent that conducts natural language dialogue with the user; means for analyzing information collected from the dialogue and evaluating the psychological health status; means for creating a notification for the user or caregiver based on the evaluation results; means for providing advice to the user based on the notification; means for continuously monitoring the information; means for analyzing the tone during the dialogue using speech recognition and emotion analysis technology and generating a response; and means for providing feedback to the caregiver via a terminal. This enables real-time evaluation of the child's psychological health status in daily life and the provision of quick and appropriate information to parents.
[0766] "Authentication information" refers to information used by a device to identify a specific user, and includes biometric information and passwords.
[0767] "User" refers to an individual who operates and interacts with the system, and in this invention, it primarily refers to a child.
[0768] "Natural language dialogue" refers to a form of communication that takes place between a user and an artificial intelligence agent through voice or text.
[0769] An "artificial intelligence agent" is a program that performs natural language processing and collects and analyzes information through dialogue with the user.
[0770] "Psychological health" refers to the state of the user's emotions, stress levels, and mental stability.
[0771] A "notification" is a message sent to the user or caregiver based on the system's evaluation.
[0772] "Advice" refers to information that includes specific actions or points to consider, suggested to the user based on the notification.
[0773] "Speech recognition technology" is a technology that converts a user's voice into text data.
[0774] "Emotional analysis technology" is an analytical technique used to identify a user's emotions and tone from collected dialogue data.
[0775] "Feedback" refers to information provided to caregivers through a device, including evaluation results and advice.
[0776] The system in this invention is primarily realized through the interaction of a server, a terminal, and a user. The server receives the user's authentication information and ensures that the user is correctly identified. It is desirable to use biometric authentication technology for this purpose. The terminal engages in natural language dialogue with the user, specifically a child, and an artificial intelligence agent. This dialogue is converted into text information using speech recognition technology and sent to the server.
[0777] The server processes the received conversation data using sentiment analysis technology to assess the user's psychological well-being. Based on this assessment, the server generates notifications and sends them to parents as needed. These notifications may include an assessment of the current situation and specific behavioral advice. The device also utilizes sentiment analysis technology to provide necessary support to the user via voice or text.
[0778] The software used includes Google Speech-to-Text API and IBM Watson Tone Analyzer for speech recognition and sentiment analysis. Firebase is used for the database to store user dialogue data and evaluation results long-term, enabling continuous monitoring.
[0779] For example, if a child is asked "What happened at school today?" while looking at their device and replies "It was fun," that data is sent to the server in real time. If the sentiment analysis evaluates it as "positive," no special notification is generated; only normal feedback is provided.
[0780] An example of a prompt for a generative AI model would be: "Listen to the user about how they're feeling today and what happened at school. If a negative tone is detected, continue asking questions to determine what to inform the parents about."
[0781] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0782] Step 1:
[0783] The user initiates access to the device. The device obtains the user's authentication information using methods such as facial recognition or fingerprint authentication. The input is the user's biometric information, and the output is whether authentication was successful or not. The device uses authentication software and proceeds to the next step if successful.
[0784] Step 2:
[0785] The device uses an artificial intelligence agent to initiate a natural language dialogue with the user. The input is the user's voice, and the output is text data. Speech recognition technology (such as Google Speech-to-Text API) is used to convert the voice data into text.
[0786] Step 3:
[0787] The server receives text data sent from the terminal and evaluates the user's psychological health using sentiment analysis technology. The input is the user's text data, and the output is the sentiment evaluation result. Data analysis is performed using tools such as IBM Watson Tone Analyzer.
[0788] Step 4:
[0789] The server generates notifications based on the results of sentiment analysis. The input is the sentiment evaluation result, and the output is the notification message. If the evaluation reaches a specific trigger, a notification is created for the caregiver.
[0790] Step 5:
[0791] The device provides advice to the user based on the generated notifications. The input is the notification message, and the output is feedback to the user in the form of voice or text. The device uses an AI assistant to read out specific advice.
[0792] Step 6:
[0793] The server stores user text data and sentiment rating results for long-term monitoring. Input is user text data and rating results, and output is storage in a database. A database service such as Firebase is used to record the data.
[0794] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0795] This invention provides a system for monitoring a child's mental health through interaction with the user, by equipping a communication terminal with an artificial intelligence character and an emotion engine. This system performs multiple processes, including user authentication, natural language processing, stress assessment, emotion recognition, and providing appropriate notifications and advice.
[0796] User Authentication
[0797] The device receives authentication information from the user upon startup and grants access to the device after verifying that the entered information is correct. The user can then use the device after clearing authentication using biometric authentication or a password.
[0798] Natural language conversation and emotion recognition
[0799] The device engages in everyday conversations with the user through an artificial intelligence character. This character uses child-friendly voices and animations to ask the user questions such as, "How are you feeling today?" When the user responds, an emotion engine analyzes the response and recognizes the user's emotional state.
[0800] Stress assessment and emotion analysis
[0801] The server receives conversation and emotion data sent from the terminal and analyzes it using natural language processing technology. In addition to assessing stress levels, it analyzes the user's emotions in detail to understand short-term and long-term trends. For example, if emotions such as sadness or anger appear frequently, it will be determined that the user is under high stress.
[0802] Providing notifications and advice
[0803] The server generates notifications for parents based on stress assessment and emotion analysis. These notifications include information about the child frequently exhibiting certain emotions and situation-specific advice. The device then provides the user with adaptive advice such as, "Let's try something to change your mood today."
[0804] Data monitoring
[0805] The server collects data over long periods and uses recorded conversation and sentiment analysis data to meticulously track the user's growth. It also sends regular feedback to parents to help monitor the user's mental well-being.
[0806] This system is designed to comprehensively support children's psychological and emotional health by gaining a three-dimensional understanding of the user's emotional state, enabling parents to intervene appropriately.
[0807] The following describes the processing flow.
[0808] Step 1:
[0809] The device displays a user authentication screen when it starts up and the user logs in. The user enters authentication information using a password or biometric authentication. The device verifies the entered information and grants access to the device if it is correct.
[0810] Step 2:
[0811] The device activates an artificial intelligence character and begins a conversation with the user. This character engages in dialogue through questions such as "How was your day?", facilitating a natural exchange.
[0812] Step 3:
[0813] The user responds to questions from the device using voice. The device receives the voice input and converts it into text data using its automatic speech recognition function. This text data is stored for sentiment analysis.
[0814] Step 4:
[0815] The device sends text data to the server. The server analyzes the received data using an emotion engine to recognize the user's emotional state (e.g., joy, sadness, anger, etc.).
[0816] Step 5:
[0817] The server assesses stress levels based on the emotion analysis results. These assessment results are used as foundational data for continuous notifications to parents. If stress levels are high, additional information may be required.
[0818] Step 6:
[0819] The server generates notifications for parents based on their emotional state and stress levels. These notifications include messages such as, "The user appears to be experiencing significant stress recently," and specific advice.
[0820] Step 7:
[0821] The server sends back notifications and advice generated by the server to the device. The device then displays advice to the user, such as, "Why not go outside for a walk to relax?", and advises appropriate actions based on the situation.
[0822] Step 8:
[0823] The server records and stores conversation data and analysis results of emotional states over long periods. Through this, it monitors the user's emotional tendencies and stress fluctuations, and provides regular feedback to parents as needed.
[0824] (Example 2)
[0825] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0826] In modern society, there is a growing need to appropriately monitor the mental and emotional health of users, especially during childhood, and provide the necessary support. However, conventional technologies have made it difficult to accurately assess users' stress levels and emotional states during this process, and to provide appropriate feedback and advice to parents and the users themselves.
[0827] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0828] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an artificial intelligence character that engages in natural language conversation with the user, and means for analyzing information obtained from the conversation to recognize the emotional state and evaluate the stress level. This makes it possible to comprehensively understand the user's mental and emotional health and enable appropriate intervention and support.
[0829] "User authentication information" refers to the information necessary to identify a user and authorize them to use a device. This includes biometric information and passwords.
[0830] An "artificial intelligence character" refers to a digital character programmed to interact with users through natural language conversation. It uses voice and animation to create a friendly and engaging conversation.
[0831] "Natural language" refers to the language that humans use on a daily basis, expressed in written or conversational forms. Technologies that process this language enable effective communication with users.
[0832] "Recognizing emotional states" refers to the process of analyzing linguistic information emanating from a user's words and actions to identify their emotions.
[0833] "Assessing stress levels" refers to a method of quantifying or evaluating the psychological burden on a user based on the information they provide.
[0834] "Notifications" refer to information and warning messages provided to users or their guardians. This includes information about the user's emotions and behavior.
[0835] "Providing advice" refers to the act of giving guidelines and suggestions tailored to the user's current situation. This encourages improvements in the user's daily behavior.
[0836] "Monitoring" refers to the act of observing specific data or situations over a long period of time to understand their changes and trends. This allows for tracking the user's growth and health status.
[0837] This invention relates to a system that incorporates artificial intelligence into a child-friendly information terminal to monitor and support the user's mental health. This system consistently performs user authentication, emotion recognition, stress assessment, notification generation, and data monitoring.
[0838] The device receives authentication information from the user via biometric authentication or password entry upon startup, and uses this information to identify the user. Once authentication is complete, an artificial intelligence character begins interacting with the user using voice and animation. For example, the character might ask, "How are you doing today?" and prompt the user for a natural language response.
[0839] The user responds to this question with words expressing their feelings, such as "I'm feeling great today." The device converts this voice information into text data and sends it to the emotion engine. The emotion engine analyzes this text using natural language processing techniques to recognize the emotional state. Common natural language processing APIs are used for the analysis, achieving highly accurate emotion recognition.
[0840] The server evaluates stress levels based on data received from the terminal. In this process, it quantifies the type and intensity of emotions, giving concrete details to the user's stress state. Over the long term, as data is collected, the server analyzes this data to understand the user's emotional tendencies and changes in stress levels.
[0841] Furthermore, the server generates notifications for parents, reporting on the effects of specific emotional states and stress. These notifications also include specific advice, such as, "Your child needs a little more rest." The device also offers suggestions to the user that are beneficial to their daily life and promote healthy behaviors.
[0842] As a concrete example, consider a scenario where a user tells an AI character about what happened at school, saying, "I made a new friend today." From this statement, the device recognizes a positive emotion, and the server collects the data and sends a notification to the parent that reinforces the positive trend. In this process, the generative AI model can use a prompt like this: "You are an AI assistant for children. Recognize the user's emotional state through conversation and provide constructive feedback as needed."
[0843] Thus, this system technically realizes health support through emotional assessment and communication.
[0844] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0845] Step 1: User Authentication
[0846] The device prompts the user to enter authentication information when the device starts up. This can be done through facial recognition, fingerprint recognition, or password entry.
[0847] Input: User's biometric data or password.
[0848] Data processing: This process involves comparing the entered authentication data with the information in the internal database.
[0849] Output: Authentication success / failure result. If successful, device use is permitted; if failed, authentication will be requested again.
[0850] Step 2: Initiating a conversation and recognizing emotions
[0851] The device initiates a conversation with a user who has successfully authenticated, using an artificial intelligence character. The character might ask questions such as, "How are you feeling today?"
[0852] Input: Voice response from the user.
[0853] Data processing: Speech recognition is used to convert audio data into text data, and that text is then passed to the emotion engine.
[0854] Output: Text data obtained from the user.
[0855] Step 3: Analysis of emotional state
[0856] The server receives text data sent from the terminal and analyzes it using natural language processing (NLP) techniques.
[0857] Input: Text data.
[0858] Data processing: This involves performing sentiment analysis based on text data to evaluate the type and intensity of emotions.
[0859] Output: Evaluation results of emotional state and stress level.
[0860] Step 4: Stress Assessment and Trend Identification
[0861] The server analyzes short-term and long-term emotional trends based on emotional state and stress levels.
[0862] Input: Assessed emotional state and stress level.
[0863] Data processing: This involves comparing current data with past data to understand current emotional trends.
[0864] Output: Latest emotional trend data and stress assessment results.
[0865] Step 5: Providing notifications and advice
[0866] The server generates notifications for parents based on emotional trend data.
[0867] Input: Latest emotional trend data and stress assessment results.
[0868] Data processing: This process involves designing advice for the user based on this data. For example, it might generate a message like, "Take more rest."
[0869] Output: Notifications for parents and on-device advice displays.
[0870] Step 6: Long-term data monitoring
[0871] The server records and analyzes data on users' emotions and stress levels over a long period of time.
[0872] Input: Accumulated emotional state and stress data.
[0873] Data processing: Perform time-series analysis to evaluate the user's psychological growth.
[0874] Output: Regular report generation and feedback submission.
[0875] (Application Example 2)
[0876] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0877] To understand a child's mental health, it is crucial to continuously monitor their emotions in their daily lives. However, currently, parents can only infer their child's state through communication, which can lead to oversights and misunderstandings. Therefore, the challenge is to create a system that naturally observes a child's emotions within the home environment and provides the necessary support.
[0878] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0879] In this invention, the server includes means for receiving user authentication information and identifying the user, means for configuring an artificial intelligence character to engage in natural language conversation with the user and evaluate the stress level, and means for generating notifications to the user or guardian based on the evaluation results and providing guidance for daily life. This makes it possible to continuously monitor the user's emotional state within the home and provide necessary support.
[0880] "User authentication information" refers to information used to identify users who access the system.
[0881] An "artificial intelligence character" is a computer program designed to interact with users and is capable of conversing in natural language.
[0882] "Stress level" is an indicator that assesses the user's emotional state and shows the degree of tension and fatigue.
[0883] A "notification" is a message or alert sent to the user or parent / guardian based on the evaluated information.
[0884] "Household appliances" refer to electrical and mechanical devices used within a user's living environment and designed to support an individual's daily life.
[0885] A "speech recognition receiving device" is a device used to acquire speech as input data and process it.
[0886] An "image acquisition device" is hardware or software used to capture visual data and to visually capture information.
[0887] "Emotional state" refers to the psychological or emotional state that the user is currently experiencing.
[0888] "Guidelines for conduct" are advice and instructions that indicate appropriate behaviors that users or their guardians should take in their daily lives.
[0889] An "interface" refers to the points of contact or means that enable interaction between a user and a system.
[0890] This invention is a system for monitoring a child's mental health through a home device and providing appropriate support. The system includes the following elements:
[0891] First, the device receives user authentication information and identifies the user. This authentication is performed using biometric authentication or a password. Once authentication is complete, the user can begin a natural language conversation through an artificial intelligence character. This conversation is conducted using a speech recognition receiver and an image acquisition device to accurately capture the user's statements and actions.
[0892] Next, the server uses natural language processing techniques and sentiment analysis engines to analyze the data obtained from the conversation. It determines the emotional state from the content of the conversation and evaluates the stress level. For example, it uses software such as IBM Watson or Google Cloud Natural Language API to analyze emotions in detail.
[0893] Based on the analyzed results, the server generates necessary notifications and advice for parents and sends them through home devices. This is done using email and application notifications. Furthermore, it is used as a guideline to show what actions families should take.
[0894] For example, if a child tells a robot, "I'm feeling a little lonely today," the system can identify that emotion and send a notification to the parent saying, "Your child may be feeling lonely. Try talking to them more."
[0895] An example of a prompt for a generative AI model is the question, "What information did the robot report about the child's mood today?" Using this prompt allows the system to report the child's emotional state more accurately, enabling appropriate follow-up.
[0896] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0897] Step 1:
[0898] The device receives user authentication information to identify the user. Inputs include biometric data and passwords. The device uses these inputs to verify the user's identity and grant access. Specific actions include biometric authentication using fingerprint sensors or cameras, and password matching.
[0899] Step 2:
[0900] After authentication is complete, the device begins a natural language conversation with the user. The input is the user's voice, and the output is that voice data. The device uses a speech recognition receiver to convert the voice into text data. This converted text data is used in the next step. The specific operations include voice input via the microphone and text conversion by speech recognition software.
[0901] Step 3:
[0902] The server receives text data sent from the terminal and performs sentiment analysis using natural language processing techniques. The input is text data, and the output is data representing the emotional state. The server uses a generative AI model to detect the user's emotional state and evaluate the stress level. Specific operations include utilizing an sentiment analysis engine and assigning sentiment labels through pattern recognition.
[0903] Step 4:
[0904] Based on the evaluation results, the server generates a notification for parents. The input is emotion data, and the output is a notification message for parents. The server creates a message that includes actionable guidelines tailored to the emotional state and stress level, and sends it as an email or application notification. Specific operations include template-based message generation and notification transmission via communication protocols.
[0905] Step 5:
[0906] The server provides advice to the user through home devices based on the generated notifications. The input is the notification message, and the output is audio or visual advice to the user. Home devices use speakers and displays to convey information visually or audibly. Specific actions include audio output using speakers and visual display on screens.
[0907] Step 6:
[0908] The server monitors user emotional data over the long term and analyzes changes and trends in behavior. The input is historical emotional data, and the output is a long-term report of the user's psychological state. The server uses the information stored in the database to analyze fluctuations over time and generate feedback. Specific operations include the use of analytical algorithms and the automatic generation of periodic reports.
[0909] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0910] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0911] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0912] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0913] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0914] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0915] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0916] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0917] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0918] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0919] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0920] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0921] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0922] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0923] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0924] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0925] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0926] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0927] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0928] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0929] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0930] The following is further disclosed regarding the embodiments described above.
[0931] (Claim 1)
[0932] A means of receiving user authentication information and identifying the user,
[0933] A means for constructing an artificial intelligence character that can engage in natural language conversation with a user,
[0934] A means for analyzing data obtained from the aforementioned conversation to evaluate the stress level,
[0935] A means for generating a notification to the user or guardian based on the aforementioned evaluation results,
[0936] Means for providing advice to the user based on the aforementioned notification,
[0937] A means for monitoring the aforementioned data over the long term,
[0938] A system that includes this.
[0939] (Claim 2)
[0940] The system according to claim 1, further comprising means for generating additional questions and further analyzing the user's responses if it is determined that the user's stress level is high during the conversation with the user.
[0941] (Claim 3)
[0942] The system according to claim 1, further comprising means for generating and transmitting specific behavioral advice to parents based on the user's stress factors.
[0943] "Example 1"
[0944] (Claim 1)
[0945] A means of receiving user authentication information and identifying the user,
[0946] A means for configuring an information processing device that performs natural language conversation with a user,
[0947] A means for analyzing the record obtained from the aforementioned conversation and evaluating the mental state,
[0948] A means for generating a notification to the supervisor based on the aforementioned evaluation results,
[0949] Means for providing users with guidelines for action based on the aforementioned notification,
[0950] Means for monitoring the aforementioned records over the long term,
[0951] A system that includes this.
[0952] (Claim 2)
[0953] The system according to claim 1, further comprising means for generating additional information and further analyzing the user's response if it is determined that the user is in a high mental state during the conversation with the user.
[0954] (Claim 3)
[0955] The system according to claim 1, further comprising means for generating and transmitting specific action guidelines to a supervisor based on the mental state of the user.
[0956] "Application Example 1"
[0957] (Claim 1)
[0958] A means of receiving authentication information and identifying the user,
[0959] A means for constructing an artificial intelligence agent that engages in natural language dialogue with a user,
[0960] A means of analyzing information collected from the aforementioned dialogue to evaluate the psychological health status,
[0961] A means of creating a notice to the user or caregiver based on the aforementioned evaluation results,
[0962] Means for providing advice to the user based on the aforementioned notice,
[0963] Means for continuously monitoring the aforementioned information,
[0964] A means for analyzing the tone of a conversation and generating a response using speech recognition and emotion analysis technologies,
[0965] A means of providing feedback to caregivers via a terminal,
[0966] A system that includes this.
[0967] (Claim 2)
[0968] The system according to claim 1, further comprising means for generating additional questions and further analyzing the user's responses if it is determined that the user's psychological health has deteriorated during the conversation with the user.
[0969] (Claim 3)
[0970] The system according to claim 1, further comprising means for generating and transmitting specific behavioral advice to a caregiver based on the user's psychological health status.
[0971] "Example 2 of combining an emotion engine"
[0972] (Claim 1)
[0973] A means of receiving user authentication information and identifying the user,
[0974] A means for constructing an artificial intelligence character that engages in natural language conversation with a user,
[0975] A means for analyzing information obtained from the aforementioned conversation to recognize the emotional state and evaluate the stress level,
[0976] A means for generating a notification to the guardian based on the aforementioned evaluation results and recognition results,
[0977] Means for providing advice to users based on the aforementioned notice,
[0978] A means of monitoring the aforementioned information over a long period and tracking the growth of users,
[0979] A system that includes this.
[0980] (Claim 2)
[0981] The system according to claim 1, further comprising means for evaluating the stress level and emotional state during a conversation with the user, and generating additional questions based on this to further analyze the user's responses.
[0982] (Claim 3)
[0983] The system according to claim 1, further comprising means for generating specific behavioral advice for the guardian and sending appropriate notifications based on the user's emotional state and stress factors.
[0984] "Application example 2 when combining with an emotional engine"
[0985] (Claim 1)
[0986] A means of receiving user authentication information and identifying the user,
[0987] A means for constructing an artificial intelligence character that can engage in natural language conversation with a user,
[0988] A means for analyzing data obtained from the aforementioned conversation to evaluate the stress level,
[0989] A means for generating a notification to the user or guardian based on the aforementioned evaluation results,
[0990] Means for providing advice to the user based on the aforementioned notification,
[0991] A means for monitoring the aforementioned data over the long term,
[0992] The aforementioned system is implemented in a home appliance and includes means for analyzing the user's emotional state using a voice recognition receiver and an image acquisition device,
[0993] The aforementioned home appliance sends notifications to guardians that serve as guidelines for action, and provides a means of providing appropriate support in daily life.
[0994] A system that includes this.
[0995] (Claim 2)
[0996] The system according to claim 1, comprising means for generating additional questions and further analyzing the user's responses if it is determined that the user's stress level is high during a conversation with the user, wherein the home device routinely observes the user's emotional state on their behalf.
[0997] (Claim 3)
[0998] The system according to claim 1, comprising means for generating and transmitting specific behavioral advice to a guardian based on the user's stress factors, wherein a home appliance functions as an interface with the user. [Explanation of Symbols]
[0999] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A means of receiving authentication information and identifying the user, A means for constructing an artificial intelligence agent that engages in natural language dialogue with a user, A means of analyzing information collected from the aforementioned dialogue to evaluate the psychological health status, A means of creating a notice to the user or caregiver based on the aforementioned evaluation results, Means for providing advice to the user based on the aforementioned notice, Means for continuously monitoring the aforementioned information, A means for analyzing the tone of a conversation and generating a response using speech recognition and emotion analysis technologies, A means of providing feedback to caregivers via a terminal, A system that includes this.
2. The system according to claim 1, further comprising means for generating additional questions and further analyzing the user's responses if it is determined that the user's psychological health has deteriorated during the conversation with the user.
3. The system according to claim 1, further comprising means for generating and transmitting specific behavioral advice to a caregiver based on the user's psychological health status.