system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A system that periodically contacts elderly individuals, analyzes conversational data, and provides personalized cognitive training addresses the challenge of monitoring health and maintaining social connections, ensuring timely support and effective cognitive function maintenance.

JP2026096469APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

AI Technical Summary

Technical Problem

In an aging society with increasing elderly populations, there is a lack of effective means to regularly monitor health status and maintain social connections, leading to issues such as solitary death and dementia, with limited systems for efficient cognitive function assessment and training.

Method used

A system equipped with a control device that periodically contacts individuals using communication means, analyzes conversational data to evaluate cognitive function and mental health, and provides personalized cognitive function training based on the analysis results, maintaining social connections and monitoring health status.

Benefits of technology

The system effectively maintains social connections and continuously monitors the health status of elderly individuals by providing personalized cognitive function training and timely notifications to medical institutions and family members.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096469000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] The control device includes a means of communication for periodically contacting the subject, A data processing means that analyzes conversation data acquired by the above communication means and evaluates the cognitive function and mental health status of the subject, Based on the above evaluation, an output means for providing cognitive function training to the subject, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] With the increase in the number of elderly people living alone in an aging society, solitary death and dementia have become serious social problems. As a result, social isolation and the difficulty of grasping the health status are cited as problems. In particular, regular health checks and the maintenance of social connections are required, but the problem is that effective means for efficiently achieving this have not yet been established.

Means for Solving the Problems

[0005] The present invention provides a system equipped with a control device that acquires conversational data by using a communication means to periodically contact a subject. It also includes data processing means to analyze the acquired conversational data and evaluate the subject's cognitive function and mental health status. Furthermore, it has output means to provide the subject with appropriate cognitive function training based on the analysis results, thereby solving the above problem. This makes it possible to maintain social connections while continuously monitoring the subject's health status.

[0006] A "control device" is a device that comprehensively manages the various means necessary for contacting the subject and performing data analysis.

[0007] "Communication means" refers to means, including communication technologies such as telephone and the internet, used to contact a subject on a regular basis.

[0008] "Conversation data" refers to digitized information, including audio and its content, obtained from a subject using communication methods.

[0009] "Data processing means" refers to a function that analyzes acquired conversation data and performs processing to evaluate cognitive function and mental health status.

[0010] "Output means" refers to means for providing cognitive function training and rehabilitation to the subject based on the results of data processing means.

[0011] "Target users" refers to users of this system, including elderly individuals, who are subject to health monitoring and training provision.

[0012] "Cognitive function training" refers to specific training programs or activities provided to maintain or improve an individual's cognitive abilities. [Brief explanation of the drawing]

[0013] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined. [[ID=4G]]

MODE FOR CARRYING OUT THE INVENTION

[0014] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings. <000009G>

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0019] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] In order to implement this invention, it is necessary for the server, terminal, and user to work together in cooperation. The specific operation of each component is shown below.

[0035] Server Embodiment

[0036] The server functions as the central hub of this system and has multiple roles. First, the server connects to the database and manages the basic information and contact history of the target individuals. This allows the system to automatically generate periodic phone schedules and notify terminals.

[0037] Next, the server analyzes the conversation data received from the terminal. This data analysis is performed using natural language processing technology to assess the cognitive function of the subject from their voice. Based on the analysis results, notifications are automatically sent to medical institutions and family members as needed.

[0038] Furthermore, the server selects cognitive function training appropriate for the individual and outputs the content to the terminal. This provides personalized feedback.

[0039] Terminal embodiment

[0040] The device functions as an interface with the target individual. Upon receiving notifications from the server, it is responsible for making phone calls to the target individual according to a specified schedule. After the call is connected, the device activates an AI agent to engage in everyday conversation with the target individual.

[0041] The terminal collects audio data during conversations and prepares to send it to the server. Furthermore, based on analysis instructions from the server, it executes a cognitive function training program for the subject and provides real-time feedback of the results to the server.

[0042] User Embodiment

[0043] Users can use their devices to converse with an AI agent, discussing everyday events and their health status. They can also try out the provided cognitive training and review the results. Based on this feedback, the training content for the next session is adjusted, thus providing a system that supports continuous health maintenance.

[0044] Specific example

[0045] For example, suppose the server schedules a phone call for person A on Monday morning. At 10:00 AM on Tuesday, the terminal automatically initiates the call, and the AI agent asks elderly person A, "Good morning, how have you been lately?" When elderly person A talks about taking walks, the terminal transcribes the conversation into text, sends it to the server, and analyzes it.

[0046] As a result of the analysis, memory training tasks that elderly person A should work on in the future are generated and suggested via the device. This allows the user to receive training that helps check their health status and maintain cognitive function through everyday conversation.

[0047] The following describes the processing flow.

[0048] Step 1:

[0049] The server checks the subject's profile data and generates the next call schedule. This profile data includes the subject's health status and past call history, and is used to determine the optimal call frequency and timing. The schedule is saved in the database and notified to the terminal.

[0050] Step 2:

[0051] The device receives a schedule notification from the server and begins preparing for the call. At the scheduled call time, the necessary speech synthesis model and conversation scenario are loaded. This enables a smooth start to the conversation.

[0052] Step 3:

[0053] The device automatically dials the target person at the specified time. The automatic redial function operates until the call is connected. Once connected, the AI agent begins the conversation.

[0054] Step 4:

[0055] The AI agent on the device asks the user questions about their daily life to facilitate conversation. Specifically, it asks questions such as, "What are your plans for today?" to elicit responses from the user.

[0056] Step 5:

[0057] Users answer questions from an AI agent, providing information about their daily activities and health status. Users can freely talk about their usual lifestyle and recent events.

[0058] Step 6:

[0059] The device collects conversations with the user as audio data and sends it to the server in real time. The audio data is simultaneously transcribed into text, and the content of the conversation is recorded.

[0060] Step 7:

[0061] The server analyzes the received conversation data to evaluate the cognitive function and mental health status of the subjects. Natural language processing technology is used to scrutinize the emotions and content of the topics discussed.

[0062] Step 8:

[0063] Based on the analysis results, the server automatically sends notifications to medical institutions or family members if necessary. This ensures that risks can be detected early.

[0064] Step 9:

[0065] The server selects a cognitive function training program suitable for the individual and transmits the training content to the terminal. The selection is made considering past training results and current health status.

[0066] Step 10:

[0067] The device executes a selected cognitive function training program and presents the user with specific training tasks. The device also records the user's interaction results and sends them to the server.

[0068] Step 11:

[0069] The user works on the training presented on their device and receives feedback on their results. Based on these results, the content of the next training session is adjusted.

[0070] (Example 1)

[0071] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0072] In modern society, factors such as an aging population and the rise of nuclear families increase the risk of social isolation for individuals. In this situation, it is crucial to appropriately assess cognitive function and mental health through regular communication and provide prompt responses and support as needed. However, effective systems for efficiently achieving this are limited. To address this problem, a system is needed that enables individualized interventions and training tailored to each person.

[0073] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0074] In this invention, the server includes a schedule generation means, a communication means, a conversion means, a data processing means, a selection means, and an output means. This enables periodic evaluation of the health status of the subject, provision of cognitive function training tailored to individual needs, and communication support to prevent social isolation.

[0075] A "control device" is a central system that manages communication with the target individual and has the function of processing data analysis and providing training.

[0076] The "schedule generation method" is a function for scheduling regular contact with the target person.

[0077] "Means of communication" refers to the methods and processes for directly contacting a subject.

[0078] "Conversion means" refers to a function that uses technology to convert audio information into text information.

[0079] "Data processing means" refers to technology for analyzing acquired textual information and evaluating the health status of the subject.

[0080] "Selection method" refers to the process of selecting cognitive function training appropriate for each individual based on data analysis.

[0081] "Output means" refers to a function that presents selected training to the target audience and supports their implementation.

[0082] "Notification method" refers to the function of communicating information to relevant parties based on analysis results.

[0083] "Means of communication" refers to a function that supports communication with other people in order to prevent the social isolation of the target individual.

[0084] For this invention to be implemented, it is essential that the server, terminal, and user work together in coordination. The specific operation of each component is described below.

[0085] Server Role

[0086] The server functions as the central hub of the system. First, the server retrieves the subject's basic information from a database and generates a schedule for regular contact. A database management system is used for this information management. Furthermore, the server converts audio data transmitted from the terminal into text data and analyzes it using natural language processing technology. This analysis evaluates the subject's cognitive function and mental health, and based on the results, selects individualized cognitive training. The generated training program is then output to the terminal.

[0087] Terminal role

[0088] The terminal functions as an interface with the subject, communicating with the server according to a specified schedule. Voice data collected during communication is converted to text in real time by the terminal and sent to the server. The terminal also executes training programs received from the server and provides real-time feedback on the results. This provides personalized feedback to the subject.

[0089] User roles

[0090] Users can converse with an AI agent through their device, discussing everyday events and their health status. They can participate in cognitive training sessions offered during this process and directly see the results. Based on this feedback, the content of subsequent training sessions is adjusted, allowing users to enjoy continuous health maintenance.

[0091] Specific example

[0092] For example, the server schedules a phone call to subject A on Monday. At 10:00 AM on Tuesday, the device automatically initiates a call to subject A, and the AI agent asks, "Good morning, how have you been lately?" When subject A talks about their recent activities, the content is collected as audio data, converted into text data, and sent to the server. There, the server analyzes the data and suggests a memory training task suitable for subject A through the device.

[0093] Example of a prompt

[0094] "Based on the phone schedule set by the server, call elderly person A, ask them about their recent life, analyze the information, and propose appropriate cognitive function training."

[0095] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0096] Step 1:

[0097] The server retrieves basic information about the target individuals from the database and generates a schedule for regular contact. This step utilizes a database management system to manage the information and plan the next call schedule. The input is the target information from the database, and the output is the call schedule for each individual.

[0098] Step 2:

[0099] The terminal automatically initiates a call to the target person at the specified time, according to the schedule received from the server. At this time, the terminal activates the AI agent and begins a conversation with the target person. The input is the schedule information sent from the server, and the output is the actual phone call to the target person.

[0100] Step 3:

[0101] The device collects voice data spoken by the subject during a call in real time and converts it into text data. This process uses speech recognition technology to convert speech to text. The input is the subject's voice data, and the output is text data sent to the server.

[0102] Step 4:

[0103] The server receives text data sent from the terminal and analyzes it using natural language processing techniques. This analysis uses a language model to understand the content of the conversation and evaluates the cognitive function and mental health status of the subject. The input is text data, and the output is the result of the health status evaluation.

[0104] Step 5:

[0105] The server selects cognitive function training appropriate for the subject based on the analysis results. This selection takes into account past data and evaluation results to determine the optimal training content. The input is the results of the health status assessment, and the output is the selected training program.

[0106] Step 6:

[0107] The server sends the selected training program to the terminal, and the terminal conducts the training for the target user. The terminal displays the training content on the screen and supports the user in following along. The input is the training program from the server, and the output is the execution of the training.

[0108] Step 7:

[0109] The terminal feeds back the training results to the server. Based on this feedback, the server adjusts the content and schedule for the next training session. The input is the training results, and the output is the adjusted training plan.

[0110] (Application Example 1)

[0111] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0112] It is necessary to prevent cognitive decline and deterioration of psychological state among users, including the elderly, and to support their continuous health maintenance. Furthermore, timely information provision to healthcare providers and relatives, and prevention of social isolation are also important issues.

[0113] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0114] In this invention, the server includes communication means for periodically contacting the user, information processing means for analyzing acquired conversation information and evaluating cognitive function and psychological state, presentation means for proposing individualized cognitive function training, and adaptive learning means for proposing cognitive function training content according to the user's interests and needs. This makes it possible to maintain the user's cognitive function and manage their psychological state, as well as to provide information quickly to medical institutions and relatives, and prevent social isolation.

[0115] A "control device" is a device that manages the entire system and issues commands to ensure that each component operates in coordination.

[0116] "Communication methods" refer to methods and technologies for regularly contacting users and obtaining conversational information.

[0117] "Information processing means" refers to technologies and devices used to analyze acquired conversational information and evaluate the cognitive function and psychological state of users.

[0118] "Presentation methods" refer to approaches for providing individualized cognitive function training to users based on assessments.

[0119] "Adaptive learning methods" refer to techniques and methods that propose cognitive function training content according to the user's interests and needs, and provide training tailored to each individual.

[0120] "Notification methods" refer to technologies and methods for notifying medical providers and relatives based on analysis results.

[0121] "Means of collaboration" refers to means and methods for preventing users from becoming socially isolated and for promoting contact with other users and supporters.

[0122] To implement this invention, the server, terminal, and user must function in coordination. The server acts as the central hub of the system, connecting to a database to manage the user's basic information and contact history. This allows the server to automatically generate periodic communication schedules and notify the terminals. The terminals function as an interface that periodically contacts the user and engages in everyday conversations using an AI agent.

[0123] Furthermore, the server analyzes the conversational information collected from the terminal using natural language processing technology to evaluate the user's cognitive function and psychological state. Based on the evaluation results, the server selects cognitive function training appropriate for the user and instructs the terminal on its content. The terminal follows these instructions, provides training to the user, and feeds back the results obtained to the server.

[0124] The server automatically sends important information to healthcare providers and relatives as needed through its notification function. Furthermore, the server facilitates interaction with other users and supporters using the same system, using collaborative means to reduce the risk of social isolation.

[0125] As a concrete example, every morning the device calls the user, and after the AI agent greets them, it asks about the user's recent daily life and health. For example, a prompt might include something like, "The AI agent starts a conversation with the user asking about their health this morning and explaining that keeping a diary has a positive effect on cognitive function."

[0126] The hardware to be used will likely include smartphones and tablets, and the software will include Google® Cloud Speech-to-Text API and Google Assistant API. The server is a computer system for data analysis, training program management, and information processing.

[0127] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0128] Step 1:

[0129] The server retrieves the user's basic information and past contact history from a database. Based on this, it generates a periodic communication schedule for the user and notifies the terminal. The input is the user information from the database, and the output is the next communication schedule information. An algorithm processes this information and operates to calculate the most efficient and appropriate timing.

[0130] Step 2:

[0131] The terminal makes a phone call to the user at the designated time, according to the communication schedule notified by the server. It activates the AI agent and starts a conversation with the user. The input is the communication schedule information, and the output is the user's voice data. At this stage, the AI agent asks greetings and questions about the user's health.

[0132] Step 3:

[0133] The device transcribes the audio data collected during the conversation into text and sends that data to the server. The input is audio data, and the output is the text data converted from the audio. The Google Cloud Speech-to-Text API is used for the audio-to-text conversion.

[0134] Step 4:

[0135] The server analyzes the received text data using natural language processing (NLP). The purpose of the analysis is to evaluate the user's cognitive function and psychological state. The input is text data, and the output is the analyzed evaluation results. Data manipulation here is performed using specialized algorithms.

[0136] Step 5:

[0137] The server selects an appropriate cognitive function training program based on the evaluation results and sends the details to the terminal. The input is the evaluation results, and the output is instruction information for the cognitive function training program. The selection process includes adaptive learning elements that take into account the user's past data and interests.

[0138] Step 6:

[0139] The terminal provides users with personalized cognitive function training based on instructions from the server. Training results are fed back to the server in real time. Input is training program information from the server, and output is training completion reports and progress data. This feedback helps adjust the content of the next training session.

[0140] Step 7:

[0141] The server sends notifications to healthcare providers and relatives as needed based on the analysis results. It also suggests collaboration with other users and supporters to prevent social isolation. Inputs are analysis results and feedback, while outputs are notifications and collaboration suggestions. Notifications are automated, ensuring appropriate information dissemination.

[0142] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0143] This invention relates to a communication system that includes a control device combined with an emotion engine, and is capable of monitoring the health status of a subject and providing cognitive function training through interaction between a server, a terminal, and a user. The detailed operation of each component is described below.

[0144] Server Embodiment

[0145] The server manages the subject's profile information and generates a regular phone call schedule. The server receives conversation data transmitted from the terminal in real time and analyzes it using data processing tools. Speech recognition and natural language processing technologies are used for the analysis to evaluate the subject's cognitive function and emotional state from the conversation content.

[0146] In particular, it incorporates an emotion engine that can recognize the subject's emotions from conversational data. This allows for the generation of personalized feedback based on emotional information and the adjustment of appropriate cognitive function training. The server also has the function to send notifications to medical institutions and family members based on the recognized emotions.

[0147] Terminal embodiment

[0148] The device functions as an interface with the target individual, receiving schedules from the server and making phone calls at the designated times. The device activates an AI agent to engage in natural, everyday conversation with the target individual. During the conversation, it collects voice data, transcribes it into text, and sends it to the server.

[0149] Furthermore, the terminal provides cognitive function training to the user based on instructions from the server. Analysis results from the emotion engine are used to adjust the training content, and feedback is provided according to the emotional state of the subject.

[0150] User Embodiment

[0151] The user (target individual) receives a phone call from an AI agent via their device and engages in conversation about their daily life. The user can receive cognitive training and learn about their own emotional state through the feedback provided.

[0152] Specific example

[0153] For example, suppose the server is scheduled to call elderly person B at 10:00 AM on Tuesday. At the scheduled time, the device calls elderly person B, and the AI agent asks, "How are you spending your day?" If elderly person B says, "I'm feeling a little lonely," this emotion is recognized by the engine and sent to the server.

[0154] Based on the analysis results, the server provides elderly person B with simple relaxation exercises via a terminal to alleviate feelings of loneliness, and simultaneously sends a notification to family members indicating that attention is needed. In this way, a system is realized that utilizes emotional data to comprehensively support the health and well-being of the target individual.

[0155] The following describes the processing flow.

[0156] Step 1:

[0157] The server checks the subject's profile information and generates a new contact schedule. Based on the subject's health history and past conversation data, it sets appropriate call frequency and timing and records it in the database.

[0158] Step 2:

[0159] The server sends schedule information to the terminal. The terminal receives this information and begins preparing for the specified date and time.

[0160] Step 3:

[0161] The device automatically calls the target person at the scheduled time. It has an automatic redial function until a connection is established.

[0162] Step 4:

[0163] After the call is connected, the AI agent asks the person questions about their daily life and emotions. Specifically, it might ask questions like, "What's been happening lately?"

[0164] Step 5:

[0165] The user responds to the AI agent's questions, talking about their current feelings and recent events. The emotions and tone of their words are the focus of emotion recognition.

[0166] Step 6:

[0167] The device collects and transcribes audio data during a conversation. This transcribed data and audio signals are sent to an emotion engine for emotion analysis.

[0168] Step 7:

[0169] The server analyzes conversational data provided by the emotion engine and assesses emotional state along with cognitive function. The results are reflected in a profile, and notifications are generated as needed.

[0170] Step 8:

[0171] Based on the analysis results, the server selects cognitive function training and emotional improvement programs suitable for the individual. The selected programs are then sent to the terminal.

[0172] Step 9:

[0173] The device executes the selected program and presents the training content specifically to the participant. For example, it might suggest, "Let's try a calm and relaxing breathing technique."

[0174] Step 10:

[0175] Users strive to improve their cognitive function and emotions by following the training provided through their devices. The training results and feedback are recorded on a server and used to prepare for the next step.

[0176] (Example 2)

[0177] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0178] For elderly individuals and those with cognitive decline, there is a need to accurately assess their emotional and cognitive states and provide personalized training and support. However, traditional approaches have faced challenges in accurately understanding emotional states and providing individualized support.

[0179] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0180] In this invention, the server includes communication means for periodically contacting the subject, data processing means for recognizing the emotional state of conversational data acquired using speech recognition and natural language processing technologies, and output means for providing personalized cognitive function training to the subject. This enables real-time understanding of emotional states and individually optimized training and notifications.

[0181] A "control device" is a device that has the function of making regular contact with the subject, analyzing data, and providing training to the subject.

[0182] A "communication method" is a mechanism that allows a control device to exchange information with a target.

[0183] "Conversation data" refers to voice and text information obtained through communication methods from the subject.

[0184] "Speech recognition" is a technology that converts speech data into text data.

[0185] "Natural language processing technology" is a technology that enables computers to understand, interpret, and respond to human language.

[0186] An "emotion engine" is a program that recognizes the emotional state of a subject from conversational data.

[0187] "Data processing means" refers to a device or program that has the function of analyzing acquired conversation data and evaluating the subject's condition.

[0188] "Output means" refers to a device or function that provides training or feedback to the subject based on the analysis results.

[0189] A "notification system" is a mechanism for informing medical institutions and family members of information based on the results of conversation analysis.

[0190] This invention is a communication system that understands the emotional state of a subject and provides appropriate cognitive function training. Its main components consist of a server, a terminal, and a user.

[0191] Server operation

[0192] The server first manages the subject's profile information, including basic information and past conversation history. Next, the server receives conversation data transmitted from the terminal in real time and converts the audio data into text data using speech recognition software. Commercial speech recognition APIs may be used for this process. Subsequently, natural language processing techniques are used to evaluate the subject's cognitive function and emotional state from the text data. By incorporating an emotion engine, it is possible to recognize emotions and generate personalized feedback. Specifically, Google Cloud's natural language API and emotion analysis tools may be used.

[0193] Terminal operation

[0194] The device has the ability to automatically make phone calls to target individuals based on a schedule sent from the server. Once the call is connected, an AI agent running on the device engages in a natural conversation with the target individual and collects conversation data. This data is converted into text and sent to the server. The device also plays a role in conveying training content and feedback provided by the server to the target individual.

[0195] User usage

[0196] Users receive phone calls from an AI agent connected via their device. Through these conversations, they can share details of their daily lives and receive necessary cognitive training. This allows users to understand their own emotional state and maintain or improve their health.

[0197] Examples of specific cases and prompt statements

[0198] For example, a server can schedule a phone call to an elderly person at 10:00 AM on a Tuesday. The device automatically calls the elderly person at the designated time, and an AI agent asks, "How are you doing today?" If the elderly person replies, "I'm feeling a little lonely," the engine recognizes this emotion and sends it to the server, resulting in the elderly person being provided with specific relaxation exercises to alleviate their feelings of loneliness.

[0199] An example of a prompt sentence generated using an AI model is, "What kind of cognitive training would be appropriate to provide to an elderly person who is feeling lonely?"

[0200] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0201] Step 1:

[0202] The server stores the subject's profile information in a database. This includes basic information, contact details, health status, and past conversation history. The input is the subject's initial registration information, and the output is structured data that is stored in the database.

[0203] Step 2:

[0204] The server generates a contact schedule with the target person and notifies the terminal. A schedule for contacting specific days and times is set as input, and the schedule information for the terminal is generated as output.

[0205] Step 3:

[0206] The terminal automatically makes a phone call to the target person based on the received schedule. The input is schedule information from the server, and the output is the establishment of a call connection with the target person.

[0207] Step 4:

[0208] The device activates an AI agent and engages in a natural conversation with the target. The input is voice data from the target, and the output is real-time voice dialogue.

[0209] Step 5:

[0210] The terminal uses speech recognition software to convert the audio data acquired during a conversation into text data and sends it to the server. The input is the subject's audio data, and the output is the transcribed conversation data.

[0211] Step 6:

[0212] The server applies natural language processing techniques to analyze text data. The input is text data sent from the terminal, and the output is the analysis of the conversation content and emotional state.

[0213] Step 7:

[0214] The server uses an emotion engine to recognize the subject's emotional state and generates individually optimized feedback. The input is analyzed emotional data, and the output is personalized training content.

[0215] Step 8:

[0216] The terminal receives training content from the server and provides it to the participant. The input is feedback information from the server, and the output is cognitive function training provided to the participant in the form of audio or visual.

[0217] Step 9:

[0218] If necessary, the server sends a notification to a healthcare provider or family member based on the analysis of the conversation. The input is the subject's emotional state and training results, and the output is a notification containing information that requires action.

[0219] (Application Example 2)

[0220] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0221] There is a challenge in accurately monitoring the emotional state of the elderly and individuals requiring care, and utilizing that information to prevent social isolation and provide appropriate health management and cognitive training. Furthermore, this necessitates a means of continuously and efficiently monitoring health status without relying on direct caregivers or medical institutions.

[0222] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0223] In this invention, the server includes communication means for periodically contacting the subject, information processing means for analyzing voice data and evaluating emotional state, output means for providing adaptive cognitive training, and means for notifying family or medical institutions of health information based on emotional data. This makes it possible to provide personalized feedback and training tailored to the subject's emotions and health condition.

[0224] A "control device" is a device that uses communication means, information processing means, and output means to analyze the emotional state of a subject and provide adaptive cognitive training or notifications.

[0225] "Communication methods" refer to techniques for regularly contacting the target individual and collecting voice data.

[0226] "Audio data" refers to acoustic information collected during conversations with subjects, and is used as material for analyzing their emotional state.

[0227] "Emotional state" refers to information that indicates an individual's emotional health, and is evaluated through the analysis of voice data.

[0228] An "information processing means" is a mechanism for analyzing acquired audio data and evaluating the emotional state of the subject.

[0229] Adaptive cognitive training is a type of training that improves cognitive function by optimizing it based on the emotional state of the individual.

[0230] "Output method" refers to a method for presenting training content to the target individual based on evaluation.

[0231] "Emotional data" refers to information indicating the emotions of a subject, obtained as a result of analyzing audio data.

[0232] The "means of notification" refer to a mechanism for communicating the health information of the subject to family members or medical institutions based on analyzed emotional data.

[0233] This invention is a system that monitors the emotional state of elderly individuals and those requiring care, and provides adaptive cognitive training. The system mainly consists of a server, terminals, and users.

[0234] The server uses speech recognition software and natural language processing (NLP) technology to receive and analyze conversational data transmitted from the terminal. Specifically, it converts the speech data into text using the speech_recognition library and analyzes the content of the text using the nltk tool. During this process, the emotion engine extracts emotional data and evaluates the subject's emotional state based on the results. Based on the evaluation results, the server generates cognitive training content appropriate for the subject and sends it to the terminal as feedback. It also notifies family members and medical institutions of this information as needed.

[0235] The device acts as a means of communication, periodically contacting the target individual based on a schedule received from the server. An AI agent on the device collects voice data through natural dialogue and sends it to the server. Furthermore, it provides the target individual with feedback and training received from the server. This allows the target individual to understand their own emotional state and receive appropriate cognitive function training.

[0236] Through this system, users (target individuals) can receive feedback on their emotional state and participate in adaptive cognitive training through everyday conversations.

[0237] For example, if a 78-year-old elderly person uses this system and says, "I feel a little lonely today," the emotion engine will detect this emotion, and the server will suggest relaxation exercises to alleviate loneliness. This information will be notified to the family as needed. In this way, the system provides highly accurate healthcare services tailored to an individual's emotional state. Another example of a prompt to the generating AI model could be, "Please tell me some effective coping strategies for when an elderly person feels lonely."

[0238] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0239] Step 1:

[0240] The device makes periodic contacts with the target person based on a schedule received from the server. It receives schedule data from the server as input and initiates a voice call to the target person as output. Specifically, the application on the device automatically initiates a voice call according to the date and time and sends a greeting such as, "How are you doing today?"

[0241] Step 2:

[0242] The user talks about their daily life through a voice call via their device. The input is a message from the device to initiate a conversation, and the output is the user speaking about everyday topics. Specifically, the user might say something like, "I was able to do some gardening today," into the device.

[0243] Step 3:

[0244] The device collects user voice and converts it to text using speech recognition technology. It receives user voice data as input and generates transcribed conversation data as output. Specifically, it uses the speech_recognition library to capture voice via the microphone.

[0245] Step 4:

[0246] The server receives transcribed conversation data sent from the terminal and performs natural language processing to analyze the emotional state. It takes text information from the conversation data as input and generates analysis results regarding the user's emotional state as output. Specifically, it uses NLTK to detect emotions within the text, and an emotion engine evaluates them.

[0247] Step 5:

[0248] The server generates adaptive cognitive training and feedback based on the analysis results. It receives the analysis results of the emotional state as input and generates training and feedback content tailored to the user as output. Specifically, if the user feels lonely, it will suggest relaxation training.

[0249] Step 6:

[0250] The server sends the generated training and feedback content to the device and notifies family members and medical institutions as needed. It receives the generated training content and notification information as input, and sends information to the device and relevant parties as output. Specifically, it sends an email notification to the family stating something like, "The person seems lonely."

[0251] Step 7:

[0252] The terminal presents the user with training and feedback received from the server. It receives the training and feedback from the server as input, and presents and executes it as output. Specifically, it gives instructions to the user through screen displays and voice assistance.

[0253] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0254] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0255] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0256] [Second Embodiment]

[0257] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0258] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0259] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0260] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0261] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0262] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0263] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0264] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0265] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0266] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0267] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0268] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0269] In order to implement this invention, it is necessary for the server, terminal, and user to work together in cooperation. The specific operation of each component is shown below.

[0270] Server Embodiment

[0271] The server functions as the central hub of this system and has multiple roles. First, the server connects to the database and manages the basic information and contact history of the target individuals. This allows the system to automatically generate periodic phone schedules and notify terminals.

[0272] Next, the server analyzes the conversation data received from the terminal. This data analysis is performed using natural language processing technology to assess the cognitive function of the subject from their voice. Based on the analysis results, notifications are automatically sent to medical institutions and family members as needed.

[0273] Furthermore, the server selects cognitive function training appropriate for the individual and outputs the content to the terminal. This provides personalized feedback.

[0274] Terminal embodiment

[0275] The device functions as an interface with the target individual. Upon receiving notifications from the server, it is responsible for making phone calls to the target individual according to a specified schedule. After the call is connected, the device activates an AI agent to engage in everyday conversation with the target individual.

[0276] The terminal collects audio data during conversations and prepares to send it to the server. Furthermore, based on analysis instructions from the server, it executes a cognitive function training program for the subject and provides real-time feedback of the results to the server.

[0277] User Embodiment

[0278] Users can use their devices to converse with an AI agent, discussing everyday events and their health status. They can also try out the provided cognitive training and review the results. Based on this feedback, the training content for the next session is adjusted, thus providing a system that supports continuous health maintenance.

[0279] Specific example

[0280] For example, assume that the server sets a phone schedule for target person A on Monday morning. At 10 o'clock on Tuesday, the terminal automatically makes a call, and the AI agent asks the elderly person A, "Good morning. How have you been spending your days lately?" When the elderly person A talks about going for a walk, the content is texted by the terminal, sent to the server, and analyzed.

[0281] As a result of the analysis, a memory training task that the elderly person A should engage in in the future is generated and proposed through the terminal. This enables the user to receive health checks and training useful for maintaining cognitive functions through daily conversations.

[0282] The following explains the processing flow.

[0283] Step 1:

[0284] The server checks the profile data of the target person and generates the next phone schedule. The profile data includes the target person's health status and past call history, and based on this, the optimal call frequency and timing are set. The schedule is saved in the database and notified to the terminal.

[0285] Step 2:

[0286] The terminal receives the schedule notification from the server and starts preparing for the call. At the scheduled call time, the necessary voice synthesis model and conversation scenario are loaded. This enables a smooth start to the conversation.

[0287] Step 3:

[0288] The terminal automatically calls the target person at the specified time. The automatic redial function operates until the call is connected. When the connection is complete, the AI agent starts the conversation.

[0289] Step 4:

[0290] The AI agent on the device asks the user questions about their daily life to facilitate conversation. Specifically, it asks questions such as, "What are your plans for today?" to elicit responses from the user.

[0291] Step 5:

[0292] Users answer questions from an AI agent, providing information about their daily activities and health status. Users can freely talk about their usual lifestyle and recent events.

[0293] Step 6:

[0294] The device collects conversations with the user as audio data and sends it to the server in real time. The audio data is simultaneously transcribed into text, and the content of the conversation is recorded.

[0295] Step 7:

[0296] The server analyzes the received conversation data to evaluate the cognitive function and mental health status of the subjects. Natural language processing technology is used to scrutinize the emotions and content of the topics discussed.

[0297] Step 8:

[0298] Based on the analysis results, the server automatically sends notifications to medical institutions or family members if necessary. This ensures that risks can be detected early.

[0299] Step 9:

[0300] The server selects a cognitive function training program suitable for the individual and transmits the training content to the terminal. The selection is made considering past training results and current health status.

[0301] Step 10:

[0302] The terminal executes the selected cognitive function training program and presents specific training tasks to the user. The terminal also records the user's interaction results and transmits them to the server.

[0303] Step 11:

[0304] The user engages in the training presented by the terminal and provides feedback on the results. Based on the results, the content of the next training is adjusted.

[0305] (Example 1)

[0306] Next, Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0307] Due to aging and the trend towards nuclear families in modern society, the risk of social isolation among individual subjects is increasing. In such situations, it is important to appropriately evaluate cognitive functions and mental health status through regular communication and provide prompt responses and support as needed. However, effective systems for doing this efficiently are limited. To solve this problem, a system that enables individual interventions and training tailored to the subject is required.

[0308] The specific processing by the specific processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0309] In this invention, the server includes a schedule generation means, a communication means, a conversion means, a data processing means, a selection means, and an output means. This enables the regular evaluation of the subject's health status, provides cognitive function training tailored to individual needs, and enables communication support to prevent social isolation.

[0310] The "control device" is a central system that manages communication with the subject and has the function of performing data analysis and training provision processing.

[0311] The "schedule generation method" is a function for scheduling regular contact with the target person.

[0312] "Means of communication" refers to the methods and processes for directly contacting a subject.

[0313] "Conversion means" refers to a function that uses technology to convert audio information into text information.

[0314] "Data processing means" refers to technology for analyzing acquired textual information and evaluating the health status of the subject.

[0315] "Selection method" refers to the process of selecting cognitive function training appropriate for each individual based on data analysis.

[0316] "Output means" refers to a function that presents selected training to the target audience and supports their implementation.

[0317] "Notification method" refers to the function of communicating information to relevant parties based on analysis results.

[0318] "Means of communication" refers to a function that supports communication with other people in order to prevent the social isolation of the target individual.

[0319] For this invention to be implemented, it is essential that the server, terminal, and user work together in coordination. The specific operation of each component is described below.

[0320] Server Role

[0321] The server functions as the central hub of the system. First, the server retrieves the subject's basic information from a database and generates a schedule for regular contact. A database management system is used for this information management. Furthermore, the server converts audio data transmitted from the terminal into text data and analyzes it using natural language processing technology. This analysis evaluates the subject's cognitive function and mental health, and based on the results, selects individualized cognitive training. The generated training program is then output to the terminal.

[0322] Terminal role

[0323] The terminal functions as an interface with the subject, communicating with the server according to a specified schedule. Voice data collected during communication is converted to text in real time by the terminal and sent to the server. The terminal also executes training programs received from the server and provides real-time feedback on the results. This provides personalized feedback to the subject.

[0324] User roles

[0325] Users can converse with an AI agent through their device, discussing everyday events and their health status. They can participate in cognitive training sessions offered during this process and directly see the results. Based on this feedback, the content of subsequent training sessions is adjusted, allowing users to enjoy continuous health maintenance.

[0326] Specific example

[0327] For example, the server schedules a phone call to subject A on Monday. At 10:00 AM on Tuesday, the device automatically initiates a call to subject A, and the AI agent asks, "Good morning, how have you been lately?" When subject A talks about their recent activities, the content is collected as audio data, converted into text data, and sent to the server. There, the server analyzes the data and suggests a memory training task suitable for subject A through the device.

[0328] Example of a prompt

[0329] "Based on the phone schedule set by the server, call elderly person A, ask them about their recent life, analyze the information, and propose appropriate cognitive function training."

[0330] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0331] Step 1:

[0332] The server retrieves basic information about the target individuals from the database and generates a schedule for regular contact. This step utilizes a database management system to manage the information and plan the next call schedule. The input is the target information from the database, and the output is the call schedule for each individual.

[0333] Step 2:

[0334] The terminal automatically initiates a call to the target person at the specified time, according to the schedule received from the server. At this time, the terminal activates the AI agent and begins a conversation with the target person. The input is the schedule information sent from the server, and the output is the actual phone call to the target person.

[0335] Step 3:

[0336] The device collects voice data spoken by the subject during a call in real time and converts it into text data. This process uses speech recognition technology to convert speech to text. The input is the subject's voice data, and the output is text data sent to the server.

[0337] Step 4:

[0338] The server receives text data sent from the terminal and analyzes it using natural language processing techniques. This analysis uses a language model to understand the content of the conversation and evaluates the cognitive function and mental health status of the subject. The input is text data, and the output is the result of the health status evaluation.

[0339] Step 5:

[0340] The server selects cognitive function training appropriate for the subject based on the analysis results. This selection takes into account past data and evaluation results to determine the optimal training content. The input is the results of the health status assessment, and the output is the selected training program.

[0341] Step 6:

[0342] The server sends the selected training program to the terminal, and the terminal conducts the training for the target user. The terminal displays the training content on the screen and supports the user in following along. The input is the training program from the server, and the output is the execution of the training.

[0343] Step 7:

[0344] The terminal feeds back the training results to the server. Based on this feedback, the server adjusts the content and schedule for the next training session. The input is the training results, and the output is the adjusted training plan.

[0345] (Application Example 1)

[0346] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0347] It is necessary to prevent cognitive decline and deterioration of psychological state among users, including the elderly, and to support their continuous health maintenance. Furthermore, timely information provision to healthcare providers and relatives, and prevention of social isolation are also important issues.

[0348] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0349] In this invention, the server includes communication means for periodically contacting the user, information processing means for analyzing acquired conversation information and evaluating cognitive function and psychological state, presentation means for proposing individualized cognitive function training, and adaptive learning means for proposing cognitive function training content according to the user's interests and needs. This makes it possible to maintain the user's cognitive function and manage their psychological state, as well as to provide information quickly to medical institutions and relatives, and prevent social isolation.

[0350] A "control device" is a device that manages the entire system and issues commands to ensure that each component operates in coordination.

[0351] "Communication methods" refer to methods and technologies for regularly contacting users and obtaining conversational information.

[0352] "Information processing means" refers to technologies and devices used to analyze acquired conversational information and evaluate the cognitive function and psychological state of users.

[0353] "Presentation methods" refer to approaches for providing individualized cognitive function training to users based on assessments.

[0354] "Adaptive learning methods" refer to techniques and methods that propose cognitive function training content according to the user's interests and needs, and provide training tailored to each individual.

[0355] "Notification methods" refer to technologies and methods for notifying medical providers and relatives based on analysis results.

[0356] "Means of collaboration" refers to means and methods for preventing users from becoming socially isolated and for promoting contact with other users and supporters.

[0357] To implement this invention, the server, terminal, and user must function in coordination. The server acts as the central hub of the system, connecting to a database to manage the user's basic information and contact history. This allows the server to automatically generate periodic communication schedules and notify the terminals. The terminals function as an interface that periodically contacts the user and engages in everyday conversations using an AI agent.

[0358] Furthermore, the server analyzes the conversational information collected from the terminal using natural language processing technology to evaluate the user's cognitive function and psychological state. Based on the evaluation results, the server selects cognitive function training appropriate for the user and instructs the terminal on its content. The terminal follows these instructions, provides training to the user, and feeds back the results obtained to the server.

[0359] The server automatically sends important information to healthcare providers and relatives as needed through its notification function. Furthermore, the server facilitates interaction with other users and supporters using the same system, using collaborative means to reduce the risk of social isolation.

[0360] As a concrete example, every morning the device calls the user, and after the AI agent greets them, it asks about the user's recent daily life and health. For example, a prompt might include something like, "The AI agent starts a conversation with the user asking about their health this morning and explaining that keeping a diary has a positive effect on cognitive function."

[0361] The hardware to be used will likely include smartphones and tablets, and the software will include the Google Cloud Speech-to-Text API and the Google Assistant API. The server is a computer system for data analysis, training program management, and information processing.

[0362] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0363] Step 1:

[0364] The server retrieves the user's basic information and past contact history from a database. Based on this, it generates a periodic communication schedule for the user and notifies the terminal. The input is the user information from the database, and the output is the next communication schedule information. An algorithm processes this information and operates to calculate the most efficient and appropriate timing.

[0365] Step 2:

[0366] The terminal makes a phone call to the user at the designated time, according to the communication schedule notified by the server. It activates the AI agent and starts a conversation with the user. The input is the communication schedule information, and the output is the user's voice data. At this stage, the AI agent asks greetings and questions about the user's health.

[0367] Step 3:

[0368] The device transcribes the audio data collected during the conversation into text and sends that data to the server. The input is audio data, and the output is the text data converted from the audio. The Google Cloud Speech-to-Text API is used for the audio-to-text conversion.

[0369] Step 4:

[0370] The server analyzes the received text data using natural language processing (NLP). The purpose of the analysis is to evaluate the user's cognitive function and psychological state. The input is text data, and the output is the analyzed evaluation results. Data manipulation here is performed using specialized algorithms.

[0371] Step 5:

[0372] The server selects an appropriate cognitive function training program based on the evaluation results and sends the details to the terminal. The input is the evaluation results, and the output is instruction information for the cognitive function training program. The selection process includes adaptive learning elements that take into account the user's past data and interests.

[0373] Step 6:

[0374] The terminal provides users with personalized cognitive function training based on instructions from the server. Training results are fed back to the server in real time. Input is training program information from the server, and output is training completion reports and progress data. This feedback helps adjust the content of the next training session.

[0375] Step 7:

[0376] The server sends notifications to healthcare providers and relatives as needed based on the analysis results. It also suggests collaboration with other users and supporters to prevent social isolation. Inputs are analysis results and feedback, while outputs are notifications and collaboration suggestions. Notifications are automated, ensuring appropriate information dissemination.

[0377] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0378] This invention relates to a communication system that includes a control device combined with an emotion engine, and is capable of monitoring the health status of a subject and providing cognitive function training through interaction between a server, a terminal, and a user. The detailed operation of each component is described below.

[0379] Server Embodiment

[0380] The server manages the subject's profile information and generates a regular phone call schedule. The server receives conversation data transmitted from the terminal in real time and analyzes it using data processing tools. Speech recognition and natural language processing technologies are used for the analysis to evaluate the subject's cognitive function and emotional state from the conversation content.

[0381] In particular, it incorporates an emotion engine that can recognize the subject's emotions from conversational data. This allows for the generation of personalized feedback based on emotional information and the adjustment of appropriate cognitive function training. The server also has the function to send notifications to medical institutions and family members based on the recognized emotions.

[0382] Terminal embodiment

[0383] The device functions as an interface with the target individual, receiving schedules from the server and making phone calls at the designated times. The device activates an AI agent to engage in natural, everyday conversation with the target individual. During the conversation, it collects voice data, transcribes it into text, and sends it to the server.

[0384] Furthermore, the terminal provides cognitive function training to the user based on instructions from the server. Analysis results from the emotion engine are used to adjust the training content, and feedback is provided according to the emotional state of the subject.

[0385] User Embodiment

[0386] The user (target individual) receives a phone call from an AI agent via their device and engages in conversation about their daily life. The user can receive cognitive training and learn about their own emotional state through the feedback provided.

[0387] Specific example

[0388] For example, suppose the server is scheduled to call elderly person B at 10:00 AM on Tuesday. At the scheduled time, the device calls elderly person B, and the AI agent asks, "How are you spending your day?" If elderly person B says, "I'm feeling a little lonely," this emotion is recognized by the engine and sent to the server.

[0389] Based on the analysis results, the server provides elderly person B with simple relaxation exercises via a terminal to alleviate feelings of loneliness, and simultaneously sends a notification to family members indicating that attention is needed. In this way, a system is realized that utilizes emotional data to comprehensively support the health and well-being of the target individual.

[0390] The following describes the processing flow.

[0391] Step 1:

[0392] The server checks the subject's profile information and generates a new contact schedule. Based on the subject's health history and past conversation data, it sets appropriate call frequency and timing and records it in the database.

[0393] Step 2:

[0394] The server sends schedule information to the terminal. The terminal receives this information and begins preparing for the specified date and time.

[0395] Step 3:

[0396] The device automatically calls the target person at the scheduled time. It has an automatic redial function until a connection is established.

[0397] Step 4:

[0398] After the call is connected, the AI agent asks the person questions about their daily life and emotions. Specifically, it might ask questions like, "What's been happening lately?"

[0399] Step 5:

[0400] The user responds to the AI agent's questions, talking about their current feelings and recent events. The emotions and tone of their words are the focus of emotion recognition.

[0401] Step 6:

[0402] The device collects and transcribes audio data during a conversation. This transcribed data and audio signals are sent to an emotion engine for emotion analysis.

[0403] Step 7:

[0404] The server analyzes conversational data provided by the emotion engine and assesses emotional state along with cognitive function. The results are reflected in a profile, and notifications are generated as needed.

[0405] Step 8:

[0406] Based on the analysis results, the server selects cognitive function training and emotional improvement programs suitable for the individual. The selected programs are then sent to the terminal.

[0407] Step 9:

[0408] The device executes the selected program and presents the training content specifically to the participant. For example, it might suggest, "Let's try a calm and relaxing breathing technique."

[0409] Step 10:

[0410] Users strive to improve their cognitive function and emotions by following the training provided through their devices. The training results and feedback are recorded on a server and used to prepare for the next step.

[0411] (Example 2)

[0412] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0413] For elderly individuals and those with cognitive decline, there is a need to accurately assess their emotional and cognitive states and provide personalized training and support. However, traditional approaches have faced challenges in accurately understanding emotional states and providing individualized support.

[0414] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0415] In this invention, the server includes communication means for periodically contacting the subject, data processing means for recognizing the emotional state of conversational data acquired using speech recognition and natural language processing technologies, and output means for providing personalized cognitive function training to the subject. This enables real-time understanding of emotional states and individually optimized training and notifications.

[0416] A "control device" is a device that has the function of making regular contact with the subject, analyzing data, and providing training to the subject.

[0417] A "communication method" is a mechanism that allows a control device to exchange information with a target.

[0418] "Conversation data" refers to voice and text information obtained through communication methods from the subject.

[0419] "Speech recognition" is a technology that converts speech data into text data.

[0420] "Natural language processing technology" is a technology that enables computers to understand, interpret, and respond to human language.

[0421] An "emotion engine" is a program that recognizes the emotional state of a subject from conversational data.

[0422] "Data processing means" refers to a device or program that has the function of analyzing acquired conversation data and evaluating the subject's condition.

[0423] "Output means" refers to a device or function that provides training or feedback to the subject based on the analysis results.

[0424] A "notification system" is a mechanism for informing medical institutions and family members of information based on the results of conversation analysis.

[0425] This invention is a communication system that understands the emotional state of a subject and provides appropriate cognitive function training. Its main components consist of a server, a terminal, and a user.

[0426] Server operation

[0427] The server first manages the subject's profile information, including basic information and past conversation history. Next, the server receives conversation data transmitted from the terminal in real time and converts the audio data into text data using speech recognition software. Commercial speech recognition APIs may be used for this process. Subsequently, natural language processing techniques are used to evaluate the subject's cognitive function and emotional state from the text data. By incorporating an emotion engine, it is possible to recognize emotions and generate personalized feedback. Specifically, Google Cloud's natural language API and emotion analysis tools may be used.

[0428] Terminal operation

[0429] The device has the ability to automatically make phone calls to target individuals based on a schedule sent from the server. Once the call is connected, an AI agent running on the device engages in a natural conversation with the target individual and collects conversation data. This data is converted into text and sent to the server. The device also plays a role in conveying training content and feedback provided by the server to the target individual.

[0430] User usage

[0431] Users receive phone calls from an AI agent connected via their device. Through these conversations, they can share details of their daily lives and receive necessary cognitive training. This allows users to understand their own emotional state and maintain or improve their health.

[0432] Examples of specific cases and prompt statements

[0433] For example, a server can schedule a phone call to an elderly person at 10:00 AM on a Tuesday. The device automatically calls the elderly person at the designated time, and an AI agent asks, "How are you doing today?" If the elderly person replies, "I'm feeling a little lonely," the engine recognizes this emotion and sends it to the server, resulting in the elderly person being provided with specific relaxation exercises to alleviate their feelings of loneliness.

[0434] An example of a prompt sentence generated using an AI model is, "What kind of cognitive training would be appropriate to provide to an elderly person who is feeling lonely?"

[0435] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0436] Step 1:

[0437] The server stores the subject's profile information in a database. This includes basic information, contact details, health status, and past conversation history. The input is the subject's initial registration information, and the output is structured data that is stored in the database.

[0438] Step 2:

[0439] The server generates a contact schedule with the target person and notifies the terminal. A schedule for contacting specific days and times is set as input, and the schedule information for the terminal is generated as output.

[0440] Step 3:

[0441] The terminal automatically makes a phone call to the target person based on the received schedule. The input is schedule information from the server, and the output is the establishment of a call connection with the target person.

[0442] Step 4:

[0443] The device activates an AI agent and engages in a natural conversation with the target. The input is voice data from the target, and the output is real-time voice dialogue.

[0444] Step 5:

[0445] The terminal uses speech recognition software to convert the audio data acquired during a conversation into text data and sends it to the server. The input is the subject's audio data, and the output is the transcribed conversation data.

[0446] Step 6:

[0447] The server applies natural language processing techniques to analyze text data. The input is text data sent from the terminal, and the output is the analysis of the conversation content and emotional state.

[0448] Step 7:

[0449] The server uses an emotion engine to recognize the subject's emotional state and generates individually optimized feedback. The input is analyzed emotional data, and the output is personalized training content.

[0450] Step 8:

[0451] The terminal receives training content from the server and provides it to the participant. The input is feedback information from the server, and the output is cognitive function training provided to the participant in the form of audio or visual.

[0452] Step 9:

[0453] If necessary, the server sends a notification to a healthcare provider or family member based on the analysis of the conversation. The input is the subject's emotional state and training results, and the output is a notification containing information that requires action.

[0454] (Application Example 2)

[0455] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0456] There is a challenge in accurately monitoring the emotional state of the elderly and individuals requiring care, and utilizing that information to prevent social isolation and provide appropriate health management and cognitive training. Furthermore, this necessitates a means of continuously and efficiently monitoring health status without relying on direct caregivers or medical institutions.

[0457] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0458] In this invention, the server includes communication means for periodically contacting the subject, information processing means for analyzing voice data and evaluating emotional state, output means for providing adaptive cognitive training, and means for notifying family or medical institutions of health information based on emotional data. This makes it possible to provide personalized feedback and training tailored to the subject's emotions and health condition.

[0459] A "control device" is a device that uses communication means, information processing means, and output means to analyze the emotional state of a subject and provide adaptive cognitive training or notifications.

[0460] "Communication methods" refer to techniques for regularly contacting the target individual and collecting voice data.

[0461] "Audio data" refers to acoustic information collected during conversations with subjects, and is used as material for analyzing their emotional state.

[0462] "Emotional state" refers to information that indicates an individual's emotional health, and is evaluated through the analysis of voice data.

[0463] An "information processing means" is a mechanism for analyzing acquired audio data and evaluating the emotional state of the subject.

[0464] Adaptive cognitive training is a type of training that improves cognitive function by optimizing it based on the emotional state of the individual.

[0465] "Output method" refers to a method for presenting training content to the target individual based on evaluation.

[0466] "Emotional data" refers to information indicating the emotions of a subject, obtained as a result of analyzing audio data.

[0467] The "means of notification" refer to a mechanism for communicating the health information of the subject to family members or medical institutions based on analyzed emotional data.

[0468] This invention is a system that monitors the emotional state of elderly individuals and those requiring care, and provides adaptive cognitive training. The system mainly consists of a server, terminals, and users.

[0469] The server uses speech recognition software and natural language processing (NLP) technology to receive and analyze conversational data transmitted from the terminal. Specifically, it converts the speech data into text using the speech_recognition library and analyzes the content of the text using the nltk tool. During this process, the emotion engine extracts emotional data and evaluates the subject's emotional state based on the results. Based on the evaluation results, the server generates cognitive training content appropriate for the subject and sends it to the terminal as feedback. It also notifies family members and medical institutions of this information as needed.

[0470] The device acts as a means of communication, periodically contacting the target individual based on a schedule received from the server. An AI agent on the device collects voice data through natural dialogue and sends it to the server. Furthermore, it provides the target individual with feedback and training received from the server. This allows the target individual to understand their own emotional state and receive appropriate cognitive function training.

[0471] Through this system, users (target individuals) can receive feedback on their emotional state and participate in adaptive cognitive training through everyday conversations.

[0472] For example, if a 78-year-old elderly person uses this system and says, "I feel a little lonely today," the emotion engine will detect this emotion, and the server will suggest relaxation exercises to alleviate loneliness. This information will be notified to the family as needed. In this way, the system provides highly accurate healthcare services tailored to an individual's emotional state. Another example of a prompt to the generating AI model could be, "Please tell me some effective coping strategies for when an elderly person feels lonely."

[0473] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0474] Step 1:

[0475] The device makes periodic contacts with the target person based on a schedule received from the server. It receives schedule data from the server as input and initiates a voice call to the target person as output. Specifically, the application on the device automatically initiates a voice call according to the date and time and sends a greeting such as, "How are you doing today?"

[0476] Step 2:

[0477] The user talks about their daily life through a voice call via their device. The input is a message from the device to initiate a conversation, and the output is the user speaking about everyday topics. Specifically, the user might say something like, "I was able to do some gardening today," into the device.

[0478] Step 3:

[0479] The device collects user voice and converts it to text using speech recognition technology. It receives user voice data as input and generates transcribed conversation data as output. Specifically, it uses the speech_recognition library to capture voice via the microphone.

[0480] Step 4:

[0481] The server receives transcribed conversation data sent from the terminal and performs natural language processing to analyze the emotional state. It takes text information from the conversation data as input and generates analysis results regarding the user's emotional state as output. Specifically, it uses NLTK to detect emotions within the text, and an emotion engine evaluates them.

[0482] Step 5:

[0483] The server generates adaptive cognitive training and feedback based on the analysis results. It receives the analysis results of the emotional state as input and generates training and feedback content tailored to the user as output. Specifically, if the user feels lonely, it will suggest relaxation training.

[0484] Step 6:

[0485] The server sends the generated training and feedback content to the device and notifies family members and medical institutions as needed. It receives the generated training content and notification information as input, and sends information to the device and relevant parties as output. Specifically, it sends an email notification to the family stating something like, "The person seems lonely."

[0486] Step 7:

[0487] The terminal presents the user with training and feedback received from the server. It receives the training and feedback from the server as input, and presents and executes it as output. Specifically, it gives instructions to the user through screen displays and voice assistance.

[0488] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0489] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0490] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0491] [Third Embodiment]

[0492] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0493] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0494] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0495] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0496] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0497] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0498] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0499] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0500] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0501] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0502] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0503] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0504] In order to implement this invention, it is necessary for the server, terminal, and user to work together in cooperation. The specific operation of each component is shown below.

[0505] Server Embodiment

[0506] The server functions as the central hub of this system and has multiple roles. First, the server connects to the database and manages the basic information and contact history of the target individuals. This allows the system to automatically generate periodic phone schedules and notify terminals.

[0507] Next, the server analyzes the conversation data received from the terminal. This data analysis is performed using natural language processing technology to assess the cognitive function of the subject from their voice. Based on the analysis results, notifications are automatically sent to medical institutions and family members as needed.

[0508] Furthermore, the server selects cognitive function training appropriate for the individual and outputs the content to the terminal. This provides personalized feedback.

[0509] Terminal embodiment

[0510] The device functions as an interface with the target individual. Upon receiving notifications from the server, it is responsible for making phone calls to the target individual according to a specified schedule. After the call is connected, the device activates an AI agent to engage in everyday conversation with the target individual.

[0511] The terminal collects audio data during conversations and prepares to send it to the server. Furthermore, based on analysis instructions from the server, it executes a cognitive function training program for the subject and provides real-time feedback of the results to the server.

[0512] User Embodiment

[0513] Users can use their devices to converse with an AI agent, discussing everyday events and their health status. They can also try out the provided cognitive training and review the results. Based on this feedback, the training content for the next session is adjusted, thus providing a system that supports continuous health maintenance.

[0514] Specific example

[0515] For example, suppose the server schedules a phone call for person A on Monday morning. At 10:00 AM on Tuesday, the terminal automatically initiates the call, and the AI agent asks elderly person A, "Good morning, how have you been lately?" When elderly person A talks about taking walks, the terminal transcribes the conversation into text, sends it to the server, and analyzes it.

[0516] As a result of the analysis, memory training tasks that elderly person A should work on in the future are generated and suggested via the device. This allows the user to receive training that helps check their health status and maintain cognitive function through everyday conversation.

[0517] The following describes the processing flow.

[0518] Step 1:

[0519] The server checks the subject's profile data and generates the next call schedule. This profile data includes the subject's health status and past call history, and is used to determine the optimal call frequency and timing. The schedule is saved in the database and notified to the terminal.

[0520] Step 2:

[0521] The device receives a schedule notification from the server and begins preparing for the call. At the scheduled call time, the necessary speech synthesis model and conversation scenario are loaded. This enables a smooth start to the conversation.

[0522] Step 3:

[0523] The device automatically dials the target person at the specified time. The automatic redial function operates until the call is connected. Once connected, the AI agent begins the conversation.

[0524] Step 4:

[0525] The AI agent on the device asks the user questions about their daily life to facilitate conversation. Specifically, it asks questions such as, "What are your plans for today?" to elicit responses from the user.

[0526] Step 5:

[0527] Users answer questions from an AI agent, providing information about their daily activities and health status. Users can freely talk about their usual lifestyle and recent events.

[0528] Step 6:

[0529] The device collects conversations with the user as audio data and sends it to the server in real time. The audio data is simultaneously transcribed into text, and the content of the conversation is recorded.

[0530] Step 7:

[0531] The server analyzes the received conversation data to evaluate the cognitive function and mental health status of the subjects. Natural language processing technology is used to scrutinize the emotions and content of the topics discussed.

[0532] Step 8:

[0533] Based on the analysis results, the server automatically sends notifications to medical institutions or family members if necessary. This ensures that risks can be detected early.

[0534] Step 9:

[0535] The server selects a cognitive function training program suitable for the individual and transmits the training content to the terminal. The selection is made considering past training results and current health status.

[0536] Step 10:

[0537] The device executes a selected cognitive function training program and presents the user with specific training tasks. The device also records the user's interaction results and sends them to the server.

[0538] Step 11:

[0539] The user works on the training presented on their device and receives feedback on their results. Based on these results, the content of the next training session is adjusted.

[0540] (Example 1)

[0541] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0542] In modern society, factors such as an aging population and the rise of nuclear families increase the risk of social isolation for individuals. In this situation, it is crucial to appropriately assess cognitive function and mental health through regular communication and provide prompt responses and support as needed. However, effective systems for efficiently achieving this are limited. To address this problem, a system is needed that enables individualized interventions and training tailored to each person.

[0543] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0544] In this invention, the server includes a schedule generation means, a communication means, a conversion means, a data processing means, a selection means, and an output means. This enables periodic evaluation of the health status of the subject, provision of cognitive function training tailored to individual needs, and communication support to prevent social isolation.

[0545] A "control device" is a central system that manages communication with the target individual and has the function of processing data analysis and providing training.

[0546] The "schedule generation method" is a function for scheduling regular contact with the target person.

[0547] "Means of communication" refers to the methods and processes for directly contacting a subject.

[0548] "Conversion means" refers to a function that uses technology to convert audio information into text information.

[0549] "Data processing means" refers to technology for analyzing acquired textual information and evaluating the health status of the subject.

[0550] "Selection method" refers to the process of selecting cognitive function training appropriate for each individual based on data analysis.

[0551] "Output means" refers to a function that presents selected training to the target audience and supports their implementation.

[0552] "Notification method" refers to the function of communicating information to relevant parties based on analysis results.

[0553] "Means of communication" refers to a function that supports communication with other people in order to prevent the social isolation of the target individual.

[0554] For this invention to be implemented, it is essential that the server, terminal, and user work together in coordination. The specific operation of each component is described below.

[0555] Server Role

[0556] The server functions as the central hub of the system. First, the server retrieves the subject's basic information from a database and generates a schedule for regular contact. A database management system is used for this information management. Furthermore, the server converts audio data transmitted from the terminal into text data and analyzes it using natural language processing technology. This analysis evaluates the subject's cognitive function and mental health, and based on the results, selects individualized cognitive training. The generated training program is then output to the terminal.

[0557] Terminal role

[0558] The terminal functions as an interface with the subject, communicating with the server according to a specified schedule. Voice data collected during communication is converted to text in real time by the terminal and sent to the server. The terminal also executes training programs received from the server and provides real-time feedback on the results. This provides personalized feedback to the subject.

[0559] User roles

[0560] Users can converse with an AI agent through their device, discussing everyday events and their health status. They can participate in cognitive training sessions offered during this process and directly see the results. Based on this feedback, the content of subsequent training sessions is adjusted, allowing users to enjoy continuous health maintenance.

[0561] Specific example

[0562] For example, the server schedules a phone call to subject A on Monday. At 10:00 AM on Tuesday, the device automatically initiates a call to subject A, and the AI agent asks, "Good morning, how have you been lately?" When subject A talks about their recent activities, the content is collected as audio data, converted into text data, and sent to the server. There, the server analyzes the data and suggests a memory training task suitable for subject A through the device.

[0563] Example of a prompt

[0564] "Based on the phone schedule set by the server, call elderly person A, ask them about their recent life, analyze the information, and propose appropriate cognitive function training."

[0565] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0566] Step 1:

[0567] The server retrieves basic information about the target individuals from the database and generates a schedule for regular contact. This step utilizes a database management system to manage the information and plan the next call schedule. The input is the target information from the database, and the output is the call schedule for each individual.

[0568] Step 2:

[0569] The terminal automatically initiates a call to the target person at the specified time, according to the schedule received from the server. At this time, the terminal activates the AI agent and begins a conversation with the target person. The input is the schedule information sent from the server, and the output is the actual phone call to the target person.

[0570] Step 3:

[0571] The device collects voice data spoken by the subject during a call in real time and converts it into text data. This process uses speech recognition technology to convert speech to text. The input is the subject's voice data, and the output is text data sent to the server.

[0572] Step 4:

[0573] The server receives text data sent from the terminal and analyzes it using natural language processing techniques. This analysis uses a language model to understand the content of the conversation and evaluates the cognitive function and mental health status of the subject. The input is text data, and the output is the result of the health status evaluation.

[0574] Step 5:

[0575] The server selects cognitive function training appropriate for the subject based on the analysis results. This selection takes into account past data and evaluation results to determine the optimal training content. The input is the results of the health status assessment, and the output is the selected training program.

[0576] Step 6:

[0577] The server sends the selected training program to the terminal, and the terminal conducts the training for the target user. The terminal displays the training content on the screen and supports the user in following along. The input is the training program from the server, and the output is the execution of the training.

[0578] Step 7:

[0579] The terminal feeds back the training results to the server. Based on this feedback, the server adjusts the content and schedule for the next training session. The input is the training results, and the output is the adjusted training plan.

[0580] (Application Example 1)

[0581] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0582] It is necessary to prevent cognitive decline and deterioration of psychological state among users, including the elderly, and to support their continuous health maintenance. Furthermore, timely information provision to healthcare providers and relatives, and prevention of social isolation are also important issues.

[0583] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0584] In this invention, the server includes communication means for periodically contacting the user, information processing means for analyzing acquired conversation information and evaluating cognitive function and psychological state, presentation means for proposing individualized cognitive function training, and adaptive learning means for proposing cognitive function training content according to the user's interests and needs. This makes it possible to maintain the user's cognitive function and manage their psychological state, as well as to provide information quickly to medical institutions and relatives, and prevent social isolation.

[0585] A "control device" is a device that manages the entire system and issues commands to ensure that each component operates in coordination.

[0586] "Communication methods" refer to methods and technologies for regularly contacting users and obtaining conversational information.

[0587] "Information processing means" refers to technologies and devices used to analyze acquired conversational information and evaluate the cognitive function and psychological state of users.

[0588] "Presentation methods" refer to approaches for providing individualized cognitive function training to users based on assessments.

[0589] "Adaptive learning methods" refer to techniques and methods that propose cognitive function training content according to the user's interests and needs, and provide training tailored to each individual.

[0590] "Notification methods" refer to technologies and methods for notifying medical providers and relatives based on analysis results.

[0591] "Means of collaboration" refers to means and methods for preventing users from becoming socially isolated and for promoting contact with other users and supporters.

[0592] To implement this invention, the server, terminal, and user must function in coordination. The server acts as the central hub of the system, connecting to a database to manage the user's basic information and contact history. This allows the server to automatically generate periodic communication schedules and notify the terminals. The terminals function as an interface that periodically contacts the user and engages in everyday conversations using an AI agent.

[0593] Furthermore, the server analyzes the conversational information collected from the terminal using natural language processing technology to evaluate the user's cognitive function and psychological state. Based on the evaluation results, the server selects cognitive function training appropriate for the user and instructs the terminal on its content. The terminal follows these instructions, provides training to the user, and feeds back the results obtained to the server.

[0594] The server automatically sends important information to healthcare providers and relatives as needed through its notification function. Furthermore, the server facilitates interaction with other users and supporters using the same system, using collaborative means to reduce the risk of social isolation.

[0595] As a concrete example, every morning the device calls the user, and after the AI agent greets them, it asks about the user's recent daily life and health. For example, a prompt might include something like, "The AI agent starts a conversation with the user asking about their health this morning and explaining that keeping a diary has a positive effect on cognitive function."

[0596] The hardware to be used will likely include smartphones and tablets, and the software will include the Google Cloud Speech-to-Text API and the Google Assistant API. The server is a computer system for data analysis, training program management, and information processing.

[0597] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0598] Step 1:

[0599] The server retrieves the user's basic information and past contact history from a database. Based on this, it generates a periodic communication schedule for the user and notifies the terminal. The input is the user information from the database, and the output is the next communication schedule information. An algorithm processes this information and operates to calculate the most efficient and appropriate timing.

[0600] Step 2:

[0601] The terminal makes a phone call to the user at the designated time, according to the communication schedule notified by the server. It activates the AI agent and starts a conversation with the user. The input is the communication schedule information, and the output is the user's voice data. At this stage, the AI agent asks greetings and questions about the user's health.

[0602] Step 3:

[0603] The device transcribes the audio data collected during the conversation into text and sends that data to the server. The input is audio data, and the output is the text data converted from the audio. The Google Cloud Speech-to-Text API is used for the audio-to-text conversion.

[0604] Step 4:

[0605] The server analyzes the received text data using natural language processing (NLP). The purpose of the analysis is to evaluate the user's cognitive function and psychological state. The input is text data, and the output is the analyzed evaluation results. Data manipulation here is performed using specialized algorithms.

[0606] Step 5:

[0607] The server selects an appropriate cognitive function training program based on the evaluation results and sends the details to the terminal. The input is the evaluation results, and the output is instruction information for the cognitive function training program. The selection process includes adaptive learning elements that take into account the user's past data and interests.

[0608] Step 6:

[0609] The terminal provides users with personalized cognitive function training based on instructions from the server. Training results are fed back to the server in real time. Input is training program information from the server, and output is training completion reports and progress data. This feedback helps adjust the content of the next training session.

[0610] Step 7:

[0611] The server sends notifications to healthcare providers and relatives as needed based on the analysis results. It also suggests collaboration with other users and supporters to prevent social isolation. Inputs are analysis results and feedback, while outputs are notifications and collaboration suggestions. Notifications are automated, ensuring appropriate information dissemination.

[0612] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0613] This invention relates to a communication system that includes a control device combined with an emotion engine, and is capable of monitoring the health status of a subject and providing cognitive function training through interaction between a server, a terminal, and a user. The detailed operation of each component is described below.

[0614] Server Embodiment

[0615] The server manages the subject's profile information and generates a regular phone call schedule. The server receives conversation data transmitted from the terminal in real time and analyzes it using data processing tools. Speech recognition and natural language processing technologies are used for the analysis to evaluate the subject's cognitive function and emotional state from the conversation content.

[0616] In particular, it incorporates an emotion engine that can recognize the subject's emotions from conversational data. This allows for the generation of personalized feedback based on emotional information and the adjustment of appropriate cognitive function training. The server also has the function to send notifications to medical institutions and family members based on the recognized emotions.

[0617] Terminal embodiment

[0618] The device functions as an interface with the target individual, receiving schedules from the server and making phone calls at the designated times. The device activates an AI agent to engage in natural, everyday conversation with the target individual. During the conversation, it collects voice data, transcribes it into text, and sends it to the server.

[0619] Furthermore, the terminal provides cognitive function training to the user based on instructions from the server. Analysis results from the emotion engine are used to adjust the training content, and feedback is provided according to the emotional state of the subject.

[0620] User Embodiment

[0621] The user (target individual) receives a phone call from an AI agent via their device and engages in conversation about their daily life. The user can receive cognitive training and learn about their own emotional state through the feedback provided.

[0622] Specific example

[0623] For example, suppose the server is scheduled to call elderly person B at 10:00 AM on Tuesday. At the scheduled time, the device calls elderly person B, and the AI agent asks, "How are you spending your day?" If elderly person B says, "I'm feeling a little lonely," this emotion is recognized by the engine and sent to the server.

[0624] Based on the analysis results, the server provides elderly person B with simple relaxation exercises via a terminal to alleviate feelings of loneliness, and simultaneously sends a notification to family members indicating that attention is needed. In this way, a system is realized that utilizes emotional data to comprehensively support the health and well-being of the target individual.

[0625] The following describes the processing flow.

[0626] Step 1:

[0627] The server checks the subject's profile information and generates a new contact schedule. Based on the subject's health history and past conversation data, it sets appropriate call frequency and timing and records it in the database.

[0628] Step 2:

[0629] The server sends schedule information to the terminal. The terminal receives this information and begins preparing for the specified date and time.

[0630] Step 3:

[0631] The device automatically calls the target person at the scheduled time. It has an automatic redial function until a connection is established.

[0632] Step 4:

[0633] After the call is connected, the AI agent asks the person questions about their daily life and emotions. Specifically, it might ask questions like, "What's been happening lately?"

[0634] Step 5:

[0635] The user responds to the AI agent's questions, talking about their current feelings and recent events. The emotions and tone of their words are the focus of emotion recognition.

[0636] Step 6:

[0637] The device collects and transcribes audio data during a conversation. This transcribed data and audio signals are sent to an emotion engine for emotion analysis.

[0638] Step 7:

[0639] The server analyzes conversational data provided by the emotion engine and assesses emotional state along with cognitive function. The results are reflected in a profile, and notifications are generated as needed.

[0640] Step 8:

[0641] Based on the analysis results, the server selects cognitive function training and emotional improvement programs suitable for the individual. The selected programs are then sent to the terminal.

[0642] Step 9:

[0643] The device executes the selected program and presents the training content specifically to the participant. For example, it might suggest, "Let's try a calm and relaxing breathing technique."

[0644] Step 10:

[0645] Users strive to improve their cognitive function and emotions by following the training provided through their devices. The training results and feedback are recorded on a server and used to prepare for the next step.

[0646] (Example 2)

[0647] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0648] For elderly individuals and those with cognitive decline, there is a need to accurately assess their emotional and cognitive states and provide personalized training and support. However, traditional approaches have faced challenges in accurately understanding emotional states and providing individualized support.

[0649] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0650] In this invention, the server includes communication means for periodically contacting the subject, data processing means for recognizing the emotional state of conversational data acquired using speech recognition and natural language processing technologies, and output means for providing personalized cognitive function training to the subject. This enables real-time understanding of emotional states and individually optimized training and notifications.

[0651] A "control device" is a device that has the function of making regular contact with the subject, analyzing data, and providing training to the subject.

[0652] A "communication method" is a mechanism that allows a control device to exchange information with a target.

[0653] "Conversation data" refers to voice and text information obtained through communication methods from the subject.

[0654] "Speech recognition" is a technology that converts speech data into text data.

[0655] "Natural language processing technology" is a technology that enables computers to understand, interpret, and respond to human language.

[0656] An "emotion engine" is a program that recognizes the emotional state of a subject from conversational data.

[0657] "Data processing means" refers to a device or program that has the function of analyzing acquired conversation data and evaluating the subject's condition.

[0658] "Output means" refers to a device or function that provides training or feedback to the subject based on the analysis results.

[0659] A "notification system" is a mechanism for informing medical institutions and family members of information based on the results of conversation analysis.

[0660] This invention is a communication system that understands the emotional state of a subject and provides appropriate cognitive function training. Its main components consist of a server, a terminal, and a user.

[0661] Server operation

[0662] The server first manages the subject's profile information, including basic information and past conversation history. Next, the server receives conversation data transmitted from the terminal in real time and converts the audio data into text data using speech recognition software. Commercial speech recognition APIs may be used for this process. Subsequently, natural language processing techniques are used to evaluate the subject's cognitive function and emotional state from the text data. By incorporating an emotion engine, it is possible to recognize emotions and generate personalized feedback. Specifically, Google Cloud's natural language API and emotion analysis tools may be used.

[0663] Terminal operation

[0664] The device has the ability to automatically make phone calls to target individuals based on a schedule sent from the server. Once the call is connected, an AI agent running on the device engages in a natural conversation with the target individual and collects conversation data. This data is converted into text and sent to the server. The device also plays a role in conveying training content and feedback provided by the server to the target individual.

[0665] User usage

[0666] Users receive phone calls from an AI agent connected via their device. Through these conversations, they can share details of their daily lives and receive necessary cognitive training. This allows users to understand their own emotional state and maintain or improve their health.

[0667] Examples of specific cases and prompt statements

[0668] For example, a server can schedule a phone call to an elderly person at 10:00 AM on a Tuesday. The device automatically calls the elderly person at the designated time, and an AI agent asks, "How are you doing today?" If the elderly person replies, "I'm feeling a little lonely," the engine recognizes this emotion and sends it to the server, resulting in the elderly person being provided with specific relaxation exercises to alleviate their feelings of loneliness.

[0669] An example of a prompt sentence generated using an AI model is, "What kind of cognitive training would be appropriate to provide to an elderly person who is feeling lonely?"

[0670] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0671] Step 1:

[0672] The server stores the subject's profile information in a database. This includes basic information, contact details, health status, and past conversation history. The input is the subject's initial registration information, and the output is structured data that is stored in the database.

[0673] Step 2:

[0674] The server generates a contact schedule with the target person and notifies the terminal. A schedule for contacting specific days and times is set as input, and the schedule information for the terminal is generated as output.

[0675] Step 3:

[0676] The terminal automatically makes a phone call to the target person based on the received schedule. The input is schedule information from the server, and the output is the establishment of a call connection with the target person.

[0677] Step 4:

[0678] The device activates an AI agent and engages in a natural conversation with the target. The input is voice data from the target, and the output is real-time voice dialogue.

[0679] Step 5:

[0680] The terminal uses speech recognition software to convert the audio data acquired during a conversation into text data and sends it to the server. The input is the subject's audio data, and the output is the transcribed conversation data.

[0681] Step 6:

[0682] The server applies natural language processing techniques to analyze text data. The input is text data sent from the terminal, and the output is the analysis of the conversation content and emotional state.

[0683] Step 7:

[0684] The server uses an emotion engine to recognize the subject's emotional state and generates individually optimized feedback. The input is analyzed emotional data, and the output is personalized training content.

[0685] Step 8:

[0686] The terminal receives training content from the server and provides it to the participant. The input is feedback information from the server, and the output is cognitive function training provided to the participant in the form of audio or visual.

[0687] Step 9:

[0688] If necessary, the server sends a notification to a healthcare provider or family member based on the analysis of the conversation. The input is the subject's emotional state and training results, and the output is a notification containing information that requires action.

[0689] (Application Example 2)

[0690] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0691] There is a challenge in accurately monitoring the emotional state of the elderly and individuals requiring care, and utilizing that information to prevent social isolation and provide appropriate health management and cognitive training. Furthermore, this necessitates a means of continuously and efficiently monitoring health status without relying on direct caregivers or medical institutions.

[0692] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0693] In this invention, the server includes communication means for periodically contacting the subject, information processing means for analyzing voice data and evaluating emotional state, output means for providing adaptive cognitive training, and means for notifying family or medical institutions of health information based on emotional data. This makes it possible to provide personalized feedback and training tailored to the subject's emotions and health condition.

[0694] A "control device" is a device that uses communication means, information processing means, and output means to analyze the emotional state of a subject and provide adaptive cognitive training or notifications.

[0695] "Communication methods" refer to techniques for regularly contacting the target individual and collecting voice data.

[0696] "Audio data" refers to acoustic information collected during conversations with subjects, and is used as material for analyzing their emotional state.

[0697] "Emotional state" refers to information that indicates an individual's emotional health, and is evaluated through the analysis of voice data.

[0698] An "information processing means" is a mechanism for analyzing acquired audio data and evaluating the emotional state of the subject.

[0699] Adaptive cognitive training is a type of training that improves cognitive function by optimizing it based on the emotional state of the individual.

[0700] "Output method" refers to a method for presenting training content to the target individual based on evaluation.

[0701] "Emotional data" refers to information indicating the emotions of a subject, obtained as a result of analyzing audio data.

[0702] The "means of notification" refer to a mechanism for communicating the health information of the subject to family members or medical institutions based on analyzed emotional data.

[0703] This invention is a system that monitors the emotional state of elderly individuals and those requiring care, and provides adaptive cognitive training. The system mainly consists of a server, terminals, and users.

[0704] The server uses speech recognition software and natural language processing (NLP) technology to receive and analyze conversational data transmitted from the terminal. Specifically, it converts the speech data into text using the speech_recognition library and analyzes the content of the text using the nltk tool. During this process, the emotion engine extracts emotional data and evaluates the subject's emotional state based on the results. Based on the evaluation results, the server generates cognitive training content appropriate for the subject and sends it to the terminal as feedback. It also notifies family members and medical institutions of this information as needed.

[0705] The device acts as a means of communication, periodically contacting the target individual based on a schedule received from the server. An AI agent on the device collects voice data through natural dialogue and sends it to the server. Furthermore, it provides the target individual with feedback and training received from the server. This allows the target individual to understand their own emotional state and receive appropriate cognitive function training.

[0706] Through this system, users (target individuals) can receive feedback on their emotional state and participate in adaptive cognitive training through everyday conversations.

[0707] For example, if a 78-year-old elderly person uses this system and says, "I feel a little lonely today," the emotion engine will detect this emotion, and the server will suggest relaxation exercises to alleviate loneliness. This information will be notified to the family as needed. In this way, the system provides highly accurate healthcare services tailored to an individual's emotional state. Another example of a prompt to the generating AI model could be, "Please tell me some effective coping strategies for when an elderly person feels lonely."

[0708] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0709] Step 1:

[0710] The device makes periodic contacts with the target person based on a schedule received from the server. It receives schedule data from the server as input and initiates a voice call to the target person as output. Specifically, the application on the device automatically initiates a voice call according to the date and time and sends a greeting such as, "How are you doing today?"

[0711] Step 2:

[0712] The user talks about their daily life through a voice call via their device. The input is a message from the device to initiate a conversation, and the output is the user speaking about everyday topics. Specifically, the user might say something like, "I was able to do some gardening today," into the device.

[0713] Step 3:

[0714] The device collects user voice and converts it to text using speech recognition technology. It receives user voice data as input and generates transcribed conversation data as output. Specifically, it uses the speech_recognition library to capture voice via the microphone.

[0715] Step 4:

[0716] The server receives transcribed conversation data sent from the terminal and performs natural language processing to analyze the emotional state. It takes text information from the conversation data as input and generates analysis results regarding the user's emotional state as output. Specifically, it uses NLTK to detect emotions within the text, and an emotion engine evaluates them.

[0717] Step 5:

[0718] The server generates adaptive cognitive training and feedback based on the analysis results. It receives the analysis results of the emotional state as input and generates training and feedback content tailored to the user as output. Specifically, if the user feels lonely, it will suggest relaxation training.

[0719] Step 6:

[0720] The server sends the generated training and feedback content to the device and notifies family members and medical institutions as needed. It receives the generated training content and notification information as input, and sends information to the device and relevant parties as output. Specifically, it sends an email notification to the family stating something like, "The person seems lonely."

[0721] Step 7:

[0722] The terminal presents the user with training and feedback received from the server. It receives the training and feedback from the server as input, and presents and executes it as output. Specifically, it gives instructions to the user through screen displays and voice assistance.

[0723] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0724] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0725] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0726] [Fourth Embodiment]

[0727] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0728] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0729] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0730] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0731] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0732] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0733] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0734] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0735] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0736] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0737] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0738] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0739] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0740] In order to implement this invention, it is necessary for the server, terminal, and user to work together in cooperation. The specific operation of each component is shown below.

[0741] Server Embodiment

[0742] The server functions as the central hub of this system and has multiple roles. First, the server connects to the database and manages the basic information and contact history of the target individuals. This allows the system to automatically generate periodic phone schedules and notify terminals.

[0743] Next, the server analyzes the conversation data received from the terminal. This data analysis is performed using natural language processing technology to assess the cognitive function of the subject from their voice. Based on the analysis results, notifications are automatically sent to medical institutions and family members as needed.

[0744] Furthermore, the server selects cognitive function training appropriate for the individual and outputs the content to the terminal. This provides personalized feedback.

[0745] Terminal embodiment

[0746] The device functions as an interface with the target individual. Upon receiving notifications from the server, it is responsible for making phone calls to the target individual according to a specified schedule. After the call is connected, the device activates an AI agent to engage in everyday conversation with the target individual.

[0747] The terminal collects audio data during conversations and prepares to send it to the server. Furthermore, based on analysis instructions from the server, it executes a cognitive function training program for the subject and provides real-time feedback of the results to the server.

[0748] User Embodiment

[0749] Users can use their devices to converse with an AI agent, discussing everyday events and their health status. They can also try out the provided cognitive training and review the results. Based on this feedback, the training content for the next session is adjusted, thus providing a system that supports continuous health maintenance.

[0750] Specific example

[0751] For example, suppose the server schedules a phone call for person A on Monday morning. At 10:00 AM on Tuesday, the terminal automatically initiates the call, and the AI agent asks elderly person A, "Good morning, how have you been lately?" When elderly person A talks about taking walks, the terminal transcribes the conversation into text, sends it to the server, and analyzes it.

[0752] As a result of the analysis, memory training tasks that elderly person A should work on in the future are generated and suggested via the device. This allows the user to receive training that helps check their health status and maintain cognitive function through everyday conversation.

[0753] The following describes the processing flow.

[0754] Step 1:

[0755] The server checks the subject's profile data and generates the next call schedule. This profile data includes the subject's health status and past call history, and is used to determine the optimal call frequency and timing. The schedule is saved in the database and notified to the terminal.

[0756] Step 2:

[0757] The device receives a schedule notification from the server and begins preparing for the call. At the scheduled call time, the necessary speech synthesis model and conversation scenario are loaded. This enables a smooth start to the conversation.

[0758] Step 3:

[0759] The device automatically dials the target person at the specified time. The automatic redial function operates until the call is connected. Once connected, the AI agent begins the conversation.

[0760] Step 4:

[0761] The AI agent on the device asks the user questions about their daily life to facilitate conversation. Specifically, it asks questions such as, "What are your plans for today?" to elicit responses from the user.

[0762] Step 5:

[0763] Users answer questions from an AI agent, providing information about their daily activities and health status. Users can freely talk about their usual lifestyle and recent events.

[0764] Step 6:

[0765] The device collects conversations with the user as audio data and sends it to the server in real time. The audio data is simultaneously transcribed into text, and the content of the conversation is recorded.

[0766] Step 7:

[0767] The server analyzes the received conversation data to evaluate the cognitive function and mental health status of the subjects. Natural language processing technology is used to scrutinize the emotions and content of the topics discussed.

[0768] Step 8:

[0769] Based on the analysis results, the server automatically sends notifications to medical institutions or family members if necessary. This ensures that risks can be detected early.

[0770] Step 9:

[0771] The server selects a cognitive function training program suitable for the individual and transmits the training content to the terminal. The selection is made considering past training results and current health status.

[0772] Step 10:

[0773] The device executes a selected cognitive function training program and presents the user with specific training tasks. The device also records the user's interaction results and sends them to the server.

[0774] Step 11:

[0775] The user works on the training presented on their device and receives feedback on their results. Based on these results, the content of the next training session is adjusted.

[0776] (Example 1)

[0777] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0778] In modern society, factors such as an aging population and the rise of nuclear families increase the risk of social isolation for individuals. In this situation, it is crucial to appropriately assess cognitive function and mental health through regular communication and provide prompt responses and support as needed. However, effective systems for efficiently achieving this are limited. To address this problem, a system is needed that enables individualized interventions and training tailored to each person.

[0779] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0780] In this invention, the server includes a schedule generation means, a communication means, a conversion means, a data processing means, a selection means, and an output means. This enables periodic evaluation of the health status of the subject, provision of cognitive function training tailored to individual needs, and communication support to prevent social isolation.

[0781] A "control device" is a central system that manages communication with the target individual and has the function of processing data analysis and providing training.

[0782] The "schedule generation method" is a function for scheduling regular contact with the target person.

[0783] "Means of communication" refers to the methods and processes for directly contacting a subject.

[0784] "Conversion means" refers to a function that uses technology to convert audio information into text information.

[0785] "Data processing means" refers to technology for analyzing acquired textual information and evaluating the health status of the subject.

[0786] "Selection method" refers to the process of selecting cognitive function training appropriate for each individual based on data analysis.

[0787] "Output means" refers to a function that presents selected training to the target audience and supports their implementation.

[0788] "Notification method" refers to the function of communicating information to relevant parties based on analysis results.

[0789] "Means of communication" refers to a function that supports communication with other people in order to prevent the social isolation of the target individual.

[0790] For this invention to be implemented, it is essential that the server, terminal, and user work together in coordination. The specific operation of each component is described below.

[0791] Server Role

[0792] The server functions as the central hub of the system. First, the server retrieves the subject's basic information from a database and generates a schedule for regular contact. A database management system is used for this information management. Furthermore, the server converts audio data transmitted from the terminal into text data and analyzes it using natural language processing technology. This analysis evaluates the subject's cognitive function and mental health, and based on the results, selects individualized cognitive training. The generated training program is then output to the terminal.

[0793] Terminal role

[0794] The terminal functions as an interface with the subject, communicating with the server according to a specified schedule. Voice data collected during communication is converted to text in real time by the terminal and sent to the server. The terminal also executes training programs received from the server and provides real-time feedback on the results. This provides personalized feedback to the subject.

[0795] User roles

[0796] Users can converse with an AI agent through their device, discussing everyday events and their health status. They can participate in cognitive training sessions offered during this process and directly see the results. Based on this feedback, the content of subsequent training sessions is adjusted, allowing users to enjoy continuous health maintenance.

[0797] Specific example

[0798] For example, the server schedules a phone call to subject A on Monday. At 10:00 AM on Tuesday, the device automatically initiates a call to subject A, and the AI agent asks, "Good morning, how have you been lately?" When subject A talks about their recent activities, the content is collected as audio data, converted into text data, and sent to the server. There, the server analyzes the data and suggests a memory training task suitable for subject A through the device.

[0799] Example of a prompt

[0800] "Based on the phone schedule set by the server, call elderly person A, ask them about their recent life, analyze the information, and propose appropriate cognitive function training."

[0801] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0802] Step 1:

[0803] The server retrieves basic information about the target individuals from the database and generates a schedule for regular contact. This step utilizes a database management system to manage the information and plan the next call schedule. The input is the target information from the database, and the output is the call schedule for each individual.

[0804] Step 2:

[0805] The terminal automatically initiates a call to the target person at the specified time, according to the schedule received from the server. At this time, the terminal activates the AI agent and begins a conversation with the target person. The input is the schedule information sent from the server, and the output is the actual phone call to the target person.

[0806] Step 3:

[0807] The device collects voice data spoken by the subject during a call in real time and converts it into text data. This process uses speech recognition technology to convert speech to text. The input is the subject's voice data, and the output is text data sent to the server.

[0808] Step 4:

[0809] The server receives text data sent from the terminal and analyzes it using natural language processing techniques. This analysis uses a language model to understand the content of the conversation and evaluates the cognitive function and mental health status of the subject. The input is text data, and the output is the result of the health status evaluation.

[0810] Step 5:

[0811] The server selects cognitive function training appropriate for the subject based on the analysis results. This selection takes into account past data and evaluation results to determine the optimal training content. The input is the results of the health status assessment, and the output is the selected training program.

[0812] Step 6:

[0813] The server sends the selected training program to the terminal, and the terminal conducts the training for the target user. The terminal displays the training content on the screen and supports the user in following along. The input is the training program from the server, and the output is the execution of the training.

[0814] Step 7:

[0815] The terminal feeds back the training results to the server. Based on this feedback, the server adjusts the content and schedule for the next training session. The input is the training results, and the output is the adjusted training plan.

[0816] (Application Example 1)

[0817] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0818] It is necessary to prevent cognitive decline and deterioration of psychological state among users, including the elderly, and to support their continuous health maintenance. Furthermore, timely information provision to healthcare providers and relatives, and prevention of social isolation are also important issues.

[0819] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0820] In this invention, the server includes communication means for periodically contacting the user, information processing means for analyzing acquired conversation information and evaluating cognitive function and psychological state, presentation means for proposing individualized cognitive function training, and adaptive learning means for proposing cognitive function training content according to the user's interests and needs. This makes it possible to maintain the user's cognitive function and manage their psychological state, as well as to provide information quickly to medical institutions and relatives, and prevent social isolation.

[0821] A "control device" is a device that manages the entire system and issues commands to ensure that each component operates in coordination.

[0822] "Communication methods" refer to methods and technologies for regularly contacting users and obtaining conversational information.

[0823] "Information processing means" refers to technologies and devices used to analyze acquired conversational information and evaluate the cognitive function and psychological state of users.

[0824] "Presentation methods" refer to approaches for providing individualized cognitive function training to users based on assessments.

[0825] "Adaptive learning methods" refer to techniques and methods that propose cognitive function training content according to the user's interests and needs, and provide training tailored to each individual.

[0826] "Notification methods" refer to technologies and methods for notifying medical providers and relatives based on analysis results.

[0827] "Means of collaboration" refers to means and methods for preventing users from becoming socially isolated and for promoting contact with other users and supporters.

[0828] To implement this invention, the server, terminal, and user must function in coordination. The server acts as the central hub of the system, connecting to a database to manage the user's basic information and contact history. This allows the server to automatically generate periodic communication schedules and notify the terminals. The terminals function as an interface that periodically contacts the user and engages in everyday conversations using an AI agent.

[0829] Furthermore, the server analyzes the conversational information collected from the terminal using natural language processing technology to evaluate the user's cognitive function and psychological state. Based on the evaluation results, the server selects cognitive function training appropriate for the user and instructs the terminal on its content. The terminal follows these instructions, provides training to the user, and feeds back the results obtained to the server.

[0830] The server automatically sends important information to healthcare providers and relatives as needed through its notification function. Furthermore, the server facilitates interaction with other users and supporters using the same system, using collaborative means to reduce the risk of social isolation.

[0831] As a concrete example, every morning the device calls the user, and after the AI agent greets them, it asks about the user's recent daily life and health. For example, a prompt might include something like, "The AI agent starts a conversation with the user asking about their health this morning and explaining that keeping a diary has a positive effect on cognitive function."

[0832] The hardware to be used will likely include smartphones and tablets, and the software will include the Google Cloud Speech-to-Text API and the Google Assistant API. The server is a computer system for data analysis, training program management, and information processing.

[0833] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0834] Step 1:

[0835] The server retrieves the user's basic information and past contact history from a database. Based on this, it generates a periodic communication schedule for the user and notifies the terminal. The input is the user information from the database, and the output is the next communication schedule information. An algorithm processes this information and operates to calculate the most efficient and appropriate timing.

[0836] Step 2:

[0837] The terminal makes a phone call to the user at the designated time, according to the communication schedule notified by the server. It activates the AI agent and starts a conversation with the user. The input is the communication schedule information, and the output is the user's voice data. At this stage, the AI agent asks greetings and questions about the user's health.

[0838] Step 3:

[0839] The device transcribes the audio data collected during the conversation into text and sends that data to the server. The input is audio data, and the output is the text data converted from the audio. The Google Cloud Speech-to-Text API is used for the audio-to-text conversion.

[0840] Step 4:

[0841] The server analyzes the received text data using natural language processing (NLP). The purpose of the analysis is to evaluate the user's cognitive function and psychological state. The input is text data, and the output is the analyzed evaluation results. Data manipulation here is performed using specialized algorithms.

[0842] Step 5:

[0843] The server selects an appropriate cognitive function training program based on the evaluation results and sends the details to the terminal. The input is the evaluation results, and the output is instruction information for the cognitive function training program. The selection process includes adaptive learning elements that take into account the user's past data and interests.

[0844] Step 6:

[0845] The terminal provides users with personalized cognitive function training based on instructions from the server. Training results are fed back to the server in real time. Input is training program information from the server, and output is training completion reports and progress data. This feedback helps adjust the content of the next training session.

[0846] Step 7:

[0847] The server sends notifications to healthcare providers and relatives as needed based on the analysis results. It also suggests collaboration with other users and supporters to prevent social isolation. Inputs are analysis results and feedback, while outputs are notifications and collaboration suggestions. Notifications are automated, ensuring appropriate information dissemination.

[0848] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0849] This invention relates to a communication system that includes a control device combined with an emotion engine, and is capable of monitoring the health status of a subject and providing cognitive function training through interaction between a server, a terminal, and a user. The detailed operation of each component is described below.

[0850] Server Embodiment

[0851] The server manages the subject's profile information and generates a regular phone call schedule. The server receives conversation data transmitted from the terminal in real time and analyzes it using data processing tools. Speech recognition and natural language processing technologies are used for the analysis to evaluate the subject's cognitive function and emotional state from the conversation content.

[0852] In particular, it incorporates an emotion engine that can recognize the subject's emotions from conversational data. This allows for the generation of personalized feedback based on emotional information and the adjustment of appropriate cognitive function training. The server also has the function to send notifications to medical institutions and family members based on the recognized emotions.

[0853] Terminal embodiment

[0854] The device functions as an interface with the target individual, receiving schedules from the server and making phone calls at the designated times. The device activates an AI agent to engage in natural, everyday conversation with the target individual. During the conversation, it collects voice data, transcribes it into text, and sends it to the server.

[0855] Furthermore, the terminal provides cognitive function training to the user based on instructions from the server. Analysis results from the emotion engine are used to adjust the training content, and feedback is provided according to the emotional state of the subject.

[0856] User Embodiment

[0857] The user (target individual) receives a phone call from an AI agent via their device and engages in conversation about their daily life. The user can receive cognitive training and learn about their own emotional state through the feedback provided.

[0858] Specific example

[0859] For example, suppose the server is scheduled to call elderly person B at 10:00 AM on Tuesday. At the scheduled time, the device calls elderly person B, and the AI agent asks, "How are you spending your day?" If elderly person B says, "I'm feeling a little lonely," this emotion is recognized by the engine and sent to the server.

[0860] Based on the analysis results, the server provides elderly person B with simple relaxation exercises via a terminal to alleviate feelings of loneliness, and simultaneously sends a notification to family members indicating that attention is needed. In this way, a system is realized that utilizes emotional data to comprehensively support the health and well-being of the target individual.

[0861] The following describes the processing flow.

[0862] Step 1:

[0863] The server checks the subject's profile information and generates a new contact schedule. Based on the subject's health history and past conversation data, it sets appropriate call frequency and timing and records it in the database.

[0864] Step 2:

[0865] The server sends schedule information to the terminal. The terminal receives this information and begins preparing for the specified date and time.

[0866] Step 3:

[0867] The device automatically calls the target person at the scheduled time. It has an automatic redial function until a connection is established.

[0868] Step 4:

[0869] After the call is connected, the AI agent asks the person questions about their daily life and emotions. Specifically, it might ask questions like, "What's been happening lately?"

[0870] Step 5:

[0871] The user responds to the AI agent's questions, talking about their current feelings and recent events. The emotions and tone of their words are the focus of emotion recognition.

[0872] Step 6:

[0873] The device collects and transcribes audio data during a conversation. This transcribed data and audio signals are sent to an emotion engine for emotion analysis.

[0874] Step 7:

[0875] The server analyzes conversational data provided by the emotion engine and assesses emotional state along with cognitive function. The results are reflected in a profile, and notifications are generated as needed.

[0876] Step 8:

[0877] Based on the analysis results, the server selects cognitive function training and emotional improvement programs suitable for the individual. The selected programs are then sent to the terminal.

[0878] Step 9:

[0879] The device executes the selected program and presents the training content specifically to the participant. For example, it might suggest, "Let's try a calm and relaxing breathing technique."

[0880] Step 10:

[0881] Users strive to improve their cognitive function and emotions by following the training provided through their devices. The training results and feedback are recorded on a server and used to prepare for the next step.

[0882] (Example 2)

[0883] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0884] For elderly individuals and those with cognitive decline, there is a need to accurately assess their emotional and cognitive states and provide personalized training and support. However, traditional approaches have faced challenges in accurately understanding emotional states and providing individualized support.

[0885] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0886] In this invention, the server includes communication means for periodically contacting the subject, data processing means for recognizing the emotional state of conversational data acquired using speech recognition and natural language processing technologies, and output means for providing personalized cognitive function training to the subject. This enables real-time understanding of emotional states and individually optimized training and notifications.

[0887] A "control device" is a device that has the function of making regular contact with the subject, analyzing data, and providing training to the subject.

[0888] A "communication method" is a mechanism that allows a control device to exchange information with a target.

[0889] "Conversation data" refers to voice and text information obtained through communication methods from the subject.

[0890] "Speech recognition" is a technology that converts speech data into text data.

[0891] "Natural language processing technology" is a technology that enables computers to understand, interpret, and respond to human language.

[0892] An "emotion engine" is a program that recognizes the emotional state of a subject from conversational data.

[0893] "Data processing means" refers to a device or program that has the function of analyzing acquired conversation data and evaluating the subject's condition.

[0894] "Output means" refers to a device or function that provides training or feedback to the subject based on the analysis results.

[0895] A "notification system" is a mechanism for informing medical institutions and family members of information based on the results of conversation analysis.

[0896] This invention is a communication system that understands the emotional state of a subject and provides appropriate cognitive function training. Its main components consist of a server, a terminal, and a user.

[0897] Server operation

[0898] The server first manages the subject's profile information, including basic information and past conversation history. Next, the server receives conversation data transmitted from the terminal in real time and converts the audio data into text data using speech recognition software. Commercial speech recognition APIs may be used for this process. Subsequently, natural language processing techniques are used to evaluate the subject's cognitive function and emotional state from the text data. By incorporating an emotion engine, it is possible to recognize emotions and generate personalized feedback. Specifically, Google Cloud's natural language API and emotion analysis tools may be used.

[0899] Terminal operation

[0900] The device has the ability to automatically make phone calls to target individuals based on a schedule sent from the server. Once the call is connected, an AI agent running on the device engages in a natural conversation with the target individual and collects conversation data. This data is converted into text and sent to the server. The device also plays a role in conveying training content and feedback provided by the server to the target individual.

[0901] User usage

[0902] Users receive phone calls from an AI agent connected via their device. Through these conversations, they can share details of their daily lives and receive necessary cognitive training. This allows users to understand their own emotional state and maintain or improve their health.

[0903] Examples of specific cases and prompt statements

[0904] For example, a server can schedule a phone call to an elderly person at 10:00 AM on a Tuesday. The device automatically calls the elderly person at the designated time, and an AI agent asks, "How are you doing today?" If the elderly person replies, "I'm feeling a little lonely," the engine recognizes this emotion and sends it to the server, resulting in the elderly person being provided with specific relaxation exercises to alleviate their feelings of loneliness.

[0905] An example of a prompt sentence generated using an AI model is, "What kind of cognitive training would be appropriate to provide to an elderly person who is feeling lonely?"

[0906] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0907] Step 1:

[0908] The server stores the subject's profile information in a database. This includes basic information, contact details, health status, and past conversation history. The input is the subject's initial registration information, and the output is structured data that is stored in the database.

[0909] Step 2:

[0910] The server generates a contact schedule with the target person and notifies the terminal. A schedule for contacting specific days and times is set as input, and the schedule information for the terminal is generated as output.

[0911] Step 3:

[0912] The terminal automatically makes a phone call to the target person based on the received schedule. The input is schedule information from the server, and the output is the establishment of a call connection with the target person.

[0913] Step 4:

[0914] The device activates an AI agent and engages in a natural conversation with the target. The input is voice data from the target, and the output is real-time voice dialogue.

[0915] Step 5:

[0916] The terminal uses speech recognition software to convert the audio data acquired during a conversation into text data and sends it to the server. The input is the subject's audio data, and the output is the transcribed conversation data.

[0917] Step 6:

[0918] The server applies natural language processing techniques to analyze text data. The input is text data sent from the terminal, and the output is the analysis of the conversation content and emotional state.

[0919] Step 7:

[0920] The server uses an emotion engine to recognize the subject's emotional state and generates individually optimized feedback. The input is analyzed emotional data, and the output is personalized training content.

[0921] Step 8:

[0922] The terminal receives training content from the server and provides it to the participant. The input is feedback information from the server, and the output is cognitive function training provided to the participant in the form of audio or visual.

[0923] Step 9:

[0924] If necessary, the server sends a notification to a healthcare provider or family member based on the analysis of the conversation. The input is the subject's emotional state and training results, and the output is a notification containing information that requires action.

[0925] (Application Example 2)

[0926] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0927] There is a challenge in accurately monitoring the emotional state of the elderly and individuals requiring care, and utilizing that information to prevent social isolation and provide appropriate health management and cognitive training. Furthermore, this necessitates a means of continuously and efficiently monitoring health status without relying on direct caregivers or medical institutions.

[0928] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0929] In this invention, the server includes communication means for periodically contacting the subject, information processing means for analyzing voice data and evaluating emotional state, output means for providing adaptive cognitive training, and means for notifying family or medical institutions of health information based on emotional data. This makes it possible to provide personalized feedback and training tailored to the subject's emotions and health condition.

[0930] A "control device" is a device that uses communication means, information processing means, and output means to analyze the emotional state of a subject and provide adaptive cognitive training or notifications.

[0931] "Communication methods" refer to techniques for regularly contacting the target individual and collecting voice data.

[0932] "Audio data" refers to acoustic information collected during conversations with subjects, and is used as material for analyzing their emotional state.

[0933] "Emotional state" refers to information that indicates an individual's emotional health, and is evaluated through the analysis of voice data.

[0934] An "information processing means" is a mechanism for analyzing acquired audio data and evaluating the emotional state of the subject.

[0935] Adaptive cognitive training is a type of training that improves cognitive function by optimizing it based on the emotional state of the individual.

[0936] "Output method" refers to a method for presenting training content to the target individual based on evaluation.

[0937] "Emotional data" refers to information indicating the emotions of a subject, obtained as a result of analyzing audio data.

[0938] The "means of notification" refer to a mechanism for communicating the health information of the subject to family members or medical institutions based on analyzed emotional data.

[0939] This invention is a system that monitors the emotional state of elderly individuals and those requiring care, and provides adaptive cognitive training. The system mainly consists of a server, terminals, and users.

[0940] The server uses speech recognition software and natural language processing (NLP) technology to receive and analyze conversational data transmitted from the terminal. Specifically, it converts the speech data into text using the speech_recognition library and analyzes the content of the text using the nltk tool. During this process, the emotion engine extracts emotional data and evaluates the subject's emotional state based on the results. Based on the evaluation results, the server generates cognitive training content appropriate for the subject and sends it to the terminal as feedback. It also notifies family members and medical institutions of this information as needed.

[0941] The device acts as a means of communication, periodically contacting the target individual based on a schedule received from the server. An AI agent on the device collects voice data through natural dialogue and sends it to the server. Furthermore, it provides the target individual with feedback and training received from the server. This allows the target individual to understand their own emotional state and receive appropriate cognitive function training.

[0942] Through this system, users (target individuals) can receive feedback on their emotional state and participate in adaptive cognitive training through everyday conversations.

[0943] For example, if a 78-year-old elderly person uses this system and says, "I feel a little lonely today," the emotion engine will detect this emotion, and the server will suggest relaxation exercises to alleviate loneliness. This information will be notified to the family as needed. In this way, the system provides highly accurate healthcare services tailored to an individual's emotional state. Another example of a prompt to the generating AI model could be, "Please tell me some effective coping strategies for when an elderly person feels lonely."

[0944] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0945] Step 1:

[0946] The device makes periodic contacts with the target person based on a schedule received from the server. It receives schedule data from the server as input and initiates a voice call to the target person as output. Specifically, the application on the device automatically initiates a voice call according to the date and time and sends a greeting such as, "How are you doing today?"

[0947] Step 2:

[0948] The user talks about their daily life through a voice call via their device. The input is a message from the device to initiate a conversation, and the output is the user speaking about everyday topics. Specifically, the user might say something like, "I was able to do some gardening today," into the device.

[0949] Step 3:

[0950] The device collects user voice and converts it to text using speech recognition technology. It receives user voice data as input and generates transcribed conversation data as output. Specifically, it uses the speech_recognition library to capture voice via the microphone.

[0951] Step 4:

[0952] The server receives transcribed conversation data sent from the terminal and performs natural language processing to analyze the emotional state. It takes text information from the conversation data as input and generates analysis results regarding the user's emotional state as output. Specifically, it uses NLTK to detect emotions within the text, and an emotion engine evaluates them.

[0953] Step 5:

[0954] The server generates adaptive cognitive training and feedback based on the analysis results. It receives the analysis results of the emotional state as input and generates training and feedback content tailored to the user as output. Specifically, if the user feels lonely, it will suggest relaxation training.

[0955] Step 6:

[0956] The server sends the generated training and feedback content to the device and notifies family members and medical institutions as needed. It receives the generated training content and notification information as input, and sends information to the device and relevant parties as output. Specifically, it sends an email notification to the family stating something like, "The person seems lonely."

[0957] Step 7:

[0958] The terminal presents the user with training and feedback received from the server. It receives the training and feedback from the server as input, and presents and executes it as output. Specifically, it gives instructions to the user through screen displays and voice assistance.

[0959] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0960] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0961] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0962] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0963] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0964] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0965] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0966] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0967] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0968] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0969] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0970] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0971] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0972] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0973] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0974] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0975] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0976] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0977] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0978] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0979] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted as being incorporated by reference.

[0980] The following is further disclosed regarding the embodiments described above.

[0981] (Claim 1)

[0982] The control device includes a means of communication for periodically contacting the subject,

[0983] A data processing means that analyzes conversation data acquired by the above communication means and evaluates the cognitive function and mental health status of the subject,

[0984] Based on the above evaluation, an output means for providing cognitive function training to the subject,

[0985] A system that includes this.

[0986] (Claim 2)

[0987] The system according to claim 1, further comprising a control device that provides notification means for notifying a medical institution or family based on the results of analyzing conversation data.

[0988] (Claim 3)

[0989] The system according to claim 1, wherein the control device further comprises a means for mediating contact between the subject and other subjects or volunteers in order to prevent the subject from becoming socially isolated.

[0990] "Example 1"

[0991] (Claim 1)

[0992] The control device includes a means for generating a schedule for periodically contacting the target person,

[0993] A means of establishing communication based on the above schedule,

[0994] A conversion means for converting audio information acquired during communication into text information,

[0995] A data processing means that analyzes the character information generated by the above conversion means and evaluates the cognitive function and mental health status of the subject,

[0996] A selection method for providing cognitive function training to the target individuals based on the above evaluation,

[0997] Output means for implementing selected cognitive function training,

[0998] A system that includes this.

[0999] (Claim 2)

[1000] The system according to claim 1, further comprising a notification means for notifying a medical institution or family based on the analysis results of the above data processing means.

[1001] (Claim 3)

[1002] The system according to claim 1, further comprising means of communication for mediating contact between the target person and other target persons or supporters in order to prevent the target person from becoming socially isolated.

[1003] "Application Example 1"

[1004] (Claim 1)

[1005] The control device includes a means of communication for periodically contacting the user,

[1006] Information processing means for analyzing conversational information acquired by the above communication means and evaluating the user's cognitive function and psychological state,

[1007] Based on the above evaluation, a means of presenting a proposal for individualized cognitive function training for the user,

[1008] Adaptive learning methods for proposing cognitive function training content tailored to the user's interests and needs,

[1009] A system that includes this.

[1010] (Claim 2)

[1011] The system according to claim 1, further comprising a control device that provides notification to a medical provider or relative based on the results of analyzing conversation information.

[1012] (Claim 3)

[1013] The system according to claim 1, wherein the control device further comprises means for mediating communication between the user and other users or supporters in order to prevent the user's social isolation.

[1014] "Example 2 of combining an emotion engine"

[1015] (Claim 1)

[1016] The control device includes a means of communication for periodically contacting the subject,

[1017] A data processing means that analyzes conversation data acquired by the above communication means using speech recognition and natural language processing technology, and recognizes the emotional state of the subject using an emotion engine.

[1018] Based on the above analysis and recognition of emotional state, an output means for providing personalized cognitive function training to the subject,

[1019] A system that includes this.

[1020] (Claim 2)

[1021] The system according to claim 1, further comprising a control device that provides notification means for notifying a medical institution or family based on the results of analyzing conversation data and emotion recognition.

[1022] (Claim 3)

[1023] The system according to claim 1, wherein the control device further comprises a means for mediating contact between the subject and other subjects or volunteers in order to prevent the subject's social isolation by utilizing emotional data.

[1024] "Application example 2 when combining with an emotional engine"

[1025] (Claim 1)

[1026] The control device includes a means of communication for periodically contacting the subject,

[1027] An information processing means that analyzes the voice data acquired by the above communication means and evaluates the emotional state of the subject,

[1028] Based on the above evaluation, an output means for providing adaptive cognitive training to the target person,

[1029] A means of notifying family members or medical institutions of a person's health information based on emotional data,

[1030] A system that includes this.

[1031] (Claim 2)

[1032] The system according to claim 1, further comprising means for coordinating contact with other individuals or volunteers to prevent social isolation of the subject, based on the results of the analysis of conversation data.

[1033] (Claim 3)

[1034] The system according to claim 1, further comprising means for analyzing user patterns using a generative AI model in order to analyze the situation of a subject and provide appropriate training and feedback according to their emotional state. [Explanation of symbols]

[1035] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. The control device includes a means of communication for periodically contacting the subject, A data processing means that analyzes conversation data acquired by the above communication means and evaluates the cognitive function and mental health status of the subject, Based on the above evaluation, an output means for providing cognitive function training to the subject, A system that includes this.

2. The system according to claim 1, further comprising a control device that provides notification means for notifying a medical institution or family based on the results of analyzing conversation data.

3. The system according to claim 1, wherein the control device further comprises a means for mediating contact between the subject and other subjects or volunteers in order to prevent the subject from becoming socially isolated.