system
A system using natural language processing and emotion detection supports elderly individuals with personalized interactions and safety monitoring, addressing cognitive decline and social isolation, enhancing their quality of life and reducing caregiver burden.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-26
AI Technical Summary
Elderly individuals face challenges with declining cognitive function and social isolation due to lack of conversation, leading to a decline in quality of life and increased burden on caregivers, as existing systems fail to provide personalized and emotionally responsive support.
A system that utilizes natural language processing and emotion detection to analyze user input, generate personalized responses, and monitor behavior, integrating devices like smartphones and servers to provide continuous support and safety notifications.
Enhances independent living for the elderly by providing personalized and emotionally tailored interactions, ensuring safety and reducing caregiver burden through timely notifications.
Smart Images

Figure 2026105444000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] Daily life problems faced by the elderly include decline in cognitive function and social isolation due to lack of conversation, and neglect of medical and health management. These problems cause the quality of life of the elderly to decline and create the problem of increasing the burden of watching over them for their families. Therefore, there is a need for a technology that allows the elderly to live independently with peace of mind and enables their families to grasp the situation.
Means for Solving the Problems
[0005] This invention provides a means for receiving and analyzing natural language voice or text input from a user via a communication device. Using natural language processing technology, it identifies the user's intent and generates an appropriate response by referring to past conversation history and pre-configured information. Furthermore, it supports everyday conversation by presenting the generated response to the user via the communication device. The system also includes a function to monitor the user's behavior and notify relevant parties via a notification device if an anomaly is detected, thereby realizing a system that improves the quality of life for the elderly and provides peace of mind to families.
[0006] A "communication device" is an electronic device that has the function of receiving and transmitting voice and text data.
[0007] "Natural language speech or text input" refers to speech or text data expressed in the language forms that humans use on a daily basis.
[0008] "Natural language processing technology" is a technology that enables computers to understand and analyze human language.
[0009] "Identifying the user's intent" is the process of identifying the underlying purpose or request from the natural language input.
[0010] "Past conversation history" refers to a data set that includes all recorded conversations with the user up to that point.
[0011] "Generating a response" is the process of constructing appropriate answers or information based on analyzed data.
[0012] A "notification device" is an electronic device used to inform specific recipients of specific information.
[0013] "Stakeholders" refers to anyone who has some interest in or responsibility for the user's situation. [Brief explanation of the drawing]
[0014] [Figure 1]It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.
Embodiments for Carrying Out the Invention
[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0016] First, the terms used in the following description will be explained.
[0017] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0018] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0019] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0020] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), etc.
[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0022] [First Embodiment]
[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0035] This invention realizes an interactive system to support the lives of the elderly and provide peace of mind to their families. The components of the system and their interactions are described below.
[0036] Executed on the server
[0037] The server is responsible for the central processing of the system. It receives voice or text input from the user and analyzes that data. The server uses natural language processing technology to understand the user's intent from their statements. Based on that intent, it has the function of referencing relevant past conversation history and schedule data to generate the optimal response.
[0038] For example, if a user says, "I'd like some advice on preparing for a trip I have planned for this weekend," the server might respond with something like, "We can provide you with a list of items you've brought on past trips and weather information for your destination."
[0039] Implementation on a device
[0040] The terminal is a device that sends user input to a server and presents the server's response to the user. It uses speech recognition technology to convert speech to text, or sends the data to the server as text input. After receiving the response from the server, the terminal presents it to the user in either text or audio format. Furthermore, it has the capability to acquire the user's location information, enabling the provision of responses tailored to specific situations.
[0041] Specific example: If a user asks their device, "What are some good restaurants in my neighborhood?", the device will display a list of recommended restaurants from the server based on the user's current location.
[0042] User interaction
[0043] Users interact with the system daily through their devices. Based on their spoken words and input information, they can initiate new conversations or request advice on schedules and health. Furthermore, the system periodically monitors the user's behavior and, if it detects any abnormalities, sends notifications to designated family members, thus also playing a monitoring role.
[0044] Specific example: If a user says "I'm going for a walk" at the same time every day, the server will notify the family if this suddenly stops happening, as this is considered an anomaly.
[0045] This invention aims to realize a form that supports independent living by having these components work together to provide personalized support to the user.
[0046] The following describes the processing flow.
[0047] Step 1:
[0048] The user enters inquiries or commands into the device via voice or text. For example, the user might say, "Tell me what's on my schedule this afternoon."
[0049] Step 2:
[0050] The terminal converts the input speech into text data. This conversion is performed using speech recognition technology. The converted text data is then sent to the server.
[0051] Step 3:
[0052] The server analyzes the received text data. It uses natural language processing technology to identify the user's intent and understand their purpose, such as "I want to know my schedule."
[0053] Step 4:
[0054] Based on the analysis, the server references the user's past conversation history and existing schedule databases. It extracts relevant information and generates an appropriate response.
[0055] Step 5:
[0056] The server sends the generated response to the terminal as text data. This includes specific information such as, "You have a health checkup today at 3 PM."
[0057] Step 6:
[0058] The terminal presents the user with response data received from the server. Information is provided by converting text to speech and reading it aloud, or by displaying it on the screen.
[0059] Step 7:
[0060] The device periodically monitors user behavior. If an anomaly is detected, such as an unexcused absence from an appointment, it notifies relevant parties via the server.
[0061] Through this series of processes, the system provides continuous and effective support to the user.
[0062] (Example 1)
[0063] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0064] Conventional interactive systems have limitations in providing a sense of security to the elderly because they are insufficient in analyzing the user's intent and generating personalized responses. Furthermore, they cannot adequately ensure user safety because they cannot detect abnormal behavior and promptly notify third parties.
[0065] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0066] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication device, means for analyzing the input and identifying the user's intent using natural language processing technology, means for generating a response by referring to past dialogue history and pre-configured data, means for optimizing the response using a generative model, and means for detecting abnormalities in the user's behavior and notifying a third party through a notification device. This enables personalized support and rapid safety checks for the user.
[0067] A "communication device" is a device that receives input from a user and exchanges data with a server.
[0068] "Natural language processing technology" is a technology that mechanically analyzes the language spoken by a user and understands their intent.
[0069] "Means for generating responses" refers to a function that creates a response by combining appropriate information based on the analyzed intent of the user.
[0070] A "generative model" is a pre-trained artificial intelligence model used to generate optimized outputs based on input data.
[0071] A "notification device" is a device used to notify a third party of the user's status or any abnormalities.
[0072] "Means for detecting abnormal behavior" refers to a function that identifies discrepancies between a user's normal behavioral patterns and their actions, and determines them to be abnormal.
[0073] This invention is an interactive system designed to support the lives of the elderly. It receives input from users via communication devices and generates and presents optimized responses to support their daily lives. The server analyzes the user's intent using natural language processing technology. Specifically, it uses a "speech recognition API" as speech recognition software to recognize speech as text. For analysis, it uses a "natural language processing model," which is a generative AI model, and an example of such a model is a "machine learning algorithm." Using this model, the system identifies the user's intent and generates responses by referring to past dialogue history and pre-configured data.
[0074] The device can use a "speech synthesis API" to present the generated response in voice or text. Furthermore, by incorporating location services, it can optimize the response based on the user's location using a "GPS module," etc. In addition, the device monitors the user's behavior and, if an anomaly is detected, sends warning information to a third party via a notification device. For example, a cloud monitoring service such as Amazon Web Services could be used.
[0075] Users can submit everyday inquiries to the system, with specific prompts such as "Please tell me the opening hours of the local pharmacy" or "Please give me advice on managing my health." This allows users to receive necessary information and advice while maintaining their independent lives.
[0076] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0077] Step 1:
[0078] The user makes a query in natural language via a communication device. The input data can be speech or text, and in some cases, a "speech recognition API" is used to convert speech to text. The output here is the content sent to the server as text data. As a concrete example, the user might say, "Tell me today's schedule."
[0079] Step 2:
[0080] The server analyzes the received text data. A generative AI model, a "natural language processing model," is used for the analysis, and "machine learning algorithms" are employed to identify the user's intent. The input is the user's text data, and the output is the identified intent. Specifically, the server recognizes the user's intent as a "schedule inquiry."
[0081] Step 3:
[0082] The server generates a response by referring to past conversation history and pre-configured data (e.g., schedule information). In this process, a generative AI model is used to optimize the content of the response. The input is the user's intent and reference data, and the output is the response content. As a concrete example, the server prepares a response such as "I have a hospital appointment at 2 PM."
[0083] Step 4:
[0084] The device receives the generated response and presents it to the user. The output can be in text or voice. In the case of voice output, the "Speech Synthesis API" can be used. Specifically, the device will inform the user in voice, "I'm going to the hospital at 2 PM."
[0085] Step 5:
[0086] The user decides on the next action based on the response received. Feedback may be needed, and the user can provide further instructions to the system. This step involves user input again, creating a loop in the system's processing. A specific action might be the user providing feedback such as "thank you."
[0087] (Application Example 1)
[0088] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0089] For elderly people to live independently, monitoring their health, reminding them of appointments, and ensuring their safety are crucial issues. In particular, a system is needed that can promptly provide support in the event of unexpected situations during daily activities.
[0090] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0091] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication device, means for identifying the user's intent using natural language processing technology, and means for checking health status and providing appointment reminders using a voice recognition device. This enhances the support that elderly people receive in their daily lives and ensures their safety through rapid notification in case of abnormalities.
[0092] A "communication device" is a device that receives voice or text input from a user and transmits it to a server.
[0093] "Natural language processing technology" is a technology that analyzes natural language data contained in speech or text to identify the user's intent.
[0094] A "speech recognition device" is a device that converts a user's speech into digital data and then converts the speech into text for input into natural language processing technology.
[0095] A "location information device" is a device that detects the geographical location of a user and provides that information to a server, thereby understanding the user's movements and monitoring their safety.
[0096] A "generative AI model" is an artificial intelligence model that is generated using machine learning to improve the quality of responses to users.
[0097] A "notification device" is a device that reports to pre-configured relevant parties when an anomaly related to the user's safety is detected.
[0098] This invention aims to realize an interactive system that supports the daily lives of the elderly. Users input information to the server via voice or text using a communication device. This communication device is often implemented using a smartphone or tablet.
[0099] When the server receives input voice or text data, it uses natural language processing technology to analyze the data and identify the user's intent. The primary software used in this process is the "Google Cloud Natural Language API." This API includes machine learning algorithms that enhance the accuracy of the analysis. Furthermore, it uses generative AI models to improve the quality of the response.
[0100] After the analysis is performed, the server refers to past conversation history and configured information to generate an appropriate response. This response is presented to the user via a speech recognition device, and the response generation uses "Microsoft® Azure® Speech Service" to support the natural generation of speech.
[0101] Furthermore, the user's location information is obtained using the GPS function built into the smartphone. Based on this information, the server monitors the user's safety and, in the event of an anomaly, quickly sends an alert to the relevant parties via a notification device. This enhances security while simultaneously enabling more personalized support.
[0102] As a concrete example, the server can provide registered users with a reminder function that tells them "This is your schedule for today" by voice at a specified time every morning. Furthermore, in the event of a critical situation, support staff will be immediately notified. An example of a prompt to input into the generative AI model would be a sentence like, "Please suggest the optimal interaction method to support the lives of the elderly through natural language processing."
[0103] The system of this invention aims to support the independent living of the elderly by having these devices and software work together to provide advanced support and monitoring.
[0104] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0105] Step 1:
[0106] The user inputs information via voice or text into a communication device. This input can include natural language, such as "Tell me my schedule for tomorrow" or "Please record today's health data." The output is then sent to the server by the communication device as voice or text data.
[0107] Step 2:
[0108] The server uses a speech recognition device to convert the received audio data into text. Specifically, it uses the "Google Cloud Speech-to-Text API" to convert the audio data into text. The output of the process is obtained as natural language text.
[0109] Step 3:
[0110] The server analyzes the converted text data and uses natural language processing techniques to understand the user's intent. It utilizes the Google Cloud Natural Language API for analysis, extracting user requests from the text data. The input is text data, and the output is a command or request based on the user's intent.
[0111] Step 4:
[0112] The server generates responses by referencing the user's past conversation history and registered information. A generative AI model is used to create more personalized responses. The generated responses are constructed in natural language, supporting a deeper understanding of the user. Inputs are the user's intent and reference data, while output is the generated response.
[0113] Step 5:
[0114] The terminal receives the response data sent from the server and presents it to the user using a speech recognition device. Here, "Microsoft Azure Speech Service" is used to convert text data into speech. The input is text response data, and the output is a voice response.
[0115] Step 6:
[0116] To obtain the user's location information, the GPS function of the communication device is activated and the user's current location is sent to the server. The input is location information from the GPS sensor, and the output is sent to the server as location data.
[0117] Step 7:
[0118] The server monitors location data and, if it detects an anomaly, immediately sends an alert to the relevant parties via a notification device. This ensures user safety. Inputs are location information and behavioral patterns, and output is an alert notification in case of an anomaly.
[0119] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0120] This invention aims to support the daily conversations of the elderly and provide personalized responses based on emotions, thereby enabling richer communication through a system incorporating an emotion engine. The system is configured as follows, with each element working in conjunction with the others.
[0121] Executed on the server
[0122] The server receives voice or text data sent by the user and performs multi-stage analysis. First, it analyzes the intent of the input using natural language processing techniques. Simultaneously, it uses an emotion engine to identify the emotional state contained in the input.
[0123] Specific example: If a user types "I'm happy because the weather is nice today," the server detects the emotion of "happiness" and generates a positive response such as, "It's a wonderful day today, do you have anything special planned?"
[0124] Implementation on a device
[0125] The terminal is a device that sends user input to a server and displays the server's response to the user. Here, adjusting the response according to the user's emotional state improves convenience and enriches the user experience. By combining voice recognition and location information acquisition functions, information tailored to the user is provided.
[0126] Specific example: The device suggests to the user, "Are you looking for a relaxing cafe in your neighborhood?" and provides local information based on the user's mood to relax.
[0127] User interaction
[0128] Users can intuitively operate the device and receive helpful support while having fun. This feature is expected to improve the quality of life for the elderly and reduce the burden on families. Furthermore, the conversation flow is flexibly changed based on emotional recognition, always providing the most appropriate dialogue for the user.
[0129] Specific example: When a user sadly says, "I'm feeling down today," providing a caring response such as, "Is there anything I can do to help?" can help improve the user's mood.
[0130] This invention embodies a technological means that makes the lives of the elderly more comfortable and fulfilling by detecting emotions and enabling detailed responses based on those emotions.
[0131] The following describes the processing flow.
[0132] Step 1:
[0133] The user inputs their feelings or questions into the device via voice or text. For example, the user might say, "I'm feeling a little lonely today."
[0134] Step 2:
[0135] The terminal converts the input speech into text format. This conversion uses speech recognition technology. The converted text is then sent to the server.
[0136] Step 3:
[0137] The server analyzes the received text data. First, it uses natural language processing techniques to identify the user's intent and recognizes that they are "feeling lonely."
[0138] Step 4:
[0139] The server activates an emotion engine to analyze emotions from the input text. In this case, it detects the emotion "loneliness" and records the emotional state.
[0140] Step 5:
[0141] The server matches the user's past conversation history with changes in their emotions to optimize its response. In this case, it generates an appropriate response to alleviate feelings of loneliness, such as creating a suggestion like, "Shall we think of something fun to do together?"
[0142] Step 6:
[0143] The server sends the generated response to the terminal.
[0144] Step 7:
[0145] The terminal presents the user with the response received from the server. The response is either read aloud or displayed on the screen.
[0146] Step 8:
[0147] The device uses an emotion engine to continuously record emotional states for future user interactions, and prepares to notify pre-designated family members or relevant parties if abnormalities persist.
[0148] This allows the system to tailor its interactions to incorporate user emotions, providing more personalized support.
[0149] (Example 2)
[0150] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0151] In everyday conversations with the elderly, there is a challenge in improving the quality of communication by providing personalized responses based on emotions. Furthermore, conventional technologies have had difficulty accurately identifying the emotional state of the user and appropriately adapting the conversation.
[0152] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0153] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication system, means for analyzing the input and identifying the user's intentions and emotions using natural language processing technology, and means for generating emotion-based personalized responses using a generative AI model. This enables natural communication with elderly people.
[0154] A "communication system" is a general term for devices and technologies that use networks to send and receive voice and text data from users.
[0155] "Natural language processing technology" refers to technologies that enable computers to understand and analyze human language, including methods for analyzing the intent and emotions behind language.
[0156] "Intention" refers to the purpose or will of what a user is trying to convey through their statements or input.
[0157] "Emotions" refer to the mental state that users express through their words and input, and include psychological reactions such as joy, sadness, and surprise.
[0158] A "generative AI model" is an artificial intelligence technology that uses machine learning algorithms to generate natural-sounding words and sentences from data.
[0159] "Personalized responses" refer to conversations and actions that are individually tailored based on the user's specific intentions and feelings.
[0160] "Location information" refers to data that indicates a user's physical geographical location and is used in maps, navigation, and other applications.
[0161] "Dynamic adaptation" refers to a system automatically adjusting its responses and actions in response to changes in the user's state or environment.
[0162] This invention relates to a communication system incorporating an emotion engine, which aims to support the daily conversations of elderly people and provide personalized responses based on their emotions. This system includes three key elements: a server, a terminal, and a user.
[0163] The server receives natural language-based voice or text data from the user transmitted from the terminal via the communication system. The server processes this data and uses natural language processing techniques to identify the user's intent and emotions. The natural language processing techniques used here are based on machine learning algorithms, and it is possible to use generative AI models such as BERT or the GPT series. Once the emotional state is identified, the server uses the generative AI model to generate a personalized response based on that emotion.
[0164] The terminal is a device that sends user input to a server, receives a response from the server, and presents it to the user. In this process, speech recognition and speech synthesis technologies are utilized to provide both data input and responses to the user in a natural manner. Specific hardware examples include smart devices equipped with speech recognition capabilities, and general-purpose voice playback applications can be used as software.
[0165] Users can interact with the system intuitively through their devices and receive support. This allows them to receive advice and information tailored to their emotions, enabling them to enjoy a richer experience even in everyday conversations.
[0166] For example, if a user says, "I'm happy the weather is nice today," the server analyzes this as text data and detects the emotion of "happiness." Based on this, the server generates a positive response such as, "It looks like it's going to be a great day, is there anything you'd like to do?" and the terminal presents it to the user. An example of this prompt message is "generating an appropriate response when an elderly person expresses a positive mood."
[0167] This system can improve the quality of life for the elderly and enable flexible and effective communication that responds to their emotions.
[0168] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0169] Step 1:
[0170] The user enters a message into the terminal via voice or text. The terminal uses speech recognition technology to convert the input voice into a digital format. If the input is text, it is processed as is and prepared to be sent to the server. At this stage, the input is the user's natural language message, which the terminal converts into digital data.
[0171] Step 2:
[0172] The device sends digital audio or text data received from the user to the server. After receiving this data, the server begins data analysis using natural language processing technology. The input data is text in string format, which the server tokenizes and analyzes the meaning and structure of the sentences using machine learning algorithms. As a result, the user's intent and emotions are identified.
[0173] Step 3:
[0174] Based on the analysis results, the server initiates a process to generate a response using a generative AI model. Here, the emotion engine contributes by inputting the most appropriate prompt sentence for the user's emotion into the generative AI model. For example, if the emotion "happy" is detected, a positive response will be generated. The output is a sentence that serves as the response to the user.
[0175] Step 4:
[0176] The response generated by the server is sent to the terminal and ready to be presented to the user. The terminal converts the received text data into speech output using speech synthesis technology, or displays it to the user as text. At this stage, the output is either speech or text that the user can recognize.
[0177] Step 5:
[0178] The user can continue the conversation by acknowledging the received response and making new inputs. Through this two-way interaction, the system continues the dialogue in accordance with the user's emotional state and intentions, achieving adaptive communication.
[0179] (Application Example 2)
[0180] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0181] In the daily lives of the elderly, a lack of communication and feelings of loneliness are serious problems. Conventional technologies cannot provide flexible and personalized responses based on emotions, making it difficult to provide appropriate support according to individual emotional states. As a result, there is insufficient suggestion of dialogues and activities that address the latent emotions and intentions of the elderly, hindering the improvement of their quality of life.
[0182] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0183] In this invention, the server includes means for receiving natural language voice or text input from an individual via a communication device, means for analyzing the input and identifying the individual's intentions using natural language processing technology, and means for identifying the individual's emotional state using emotion detection functionality. This makes it possible to accurately analyze an individual's emotional state and intentions and provide personalized responses and activity suggestions based on them.
[0184] A "communication device" is a device that receives natural language voice or text input from an individual and exchanges information with a server.
[0185] "Natural language processing technology" refers to techniques for analyzing individual speech and text to identify their intent and meaning, and includes machine learning algorithms.
[0186] The "emotion detection function" is a function that identifies and analyzes the emotional state contained in an individual's statements and input content.
[0187] "Past conversation history" refers to a record of conversations and interactions an individual has had in the past, and is information that is referenced when generating responses.
[0188] A "notification device" is a device that transmits information to relevant parties when abnormal behavior of an individual is detected.
[0189] A "personalized response" is a response generated based on an individual's emotional state and intentions, and is a form of dialogue with content that is individually applied.
[0190] "Activity suggestions" are proposals aimed at improving the quality of life by suggesting optimal actions and activities according to an individual's emotional state.
[0191] The system of this invention consists of a communication device used by individuals, including the elderly, an emotion analysis system via a server, and a notification device that notifies relevant parties. The main role of the system is to analyze natural language speech and text input obtained from individuals to understand their intentions and emotional state.
[0192] The server receives input via communication devices and uses natural language processing techniques to identify the individual's intentions. It utilizes machine learning algorithms to precisely analyze the input. Furthermore, it uses emotion detection to identify the individual's emotional state and generates personalized responses by comparing them with past conversation history. This entire process can utilize a common cloud-based platform.
[0193] The generated response is presented to the individual via a communication device. For example, if an elderly person inputs "I'm lonely today," the system will offer a hospitable suggestion such as, "Why don't you talk to a friend on the phone? Or would you like to watch a movie together?"
[0194] Furthermore, the system can improve an individual's quality of life by suggesting activities based on their emotional state. An example of a prompt might be: "If the user wants to relax, generate a response suggesting relaxing facilities in the area."
[0195] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0196] Step 1:
[0197] The server receives natural language speech or text input from individuals via communication devices. This input is taken into the system as raw data.
[0198] Step 2:
[0199] The server analyzes the received input using natural language processing techniques. This includes grammatical analysis and keyword extraction of the input content, as well as data processing to identify the individual's intent. The output is text data with the identified intent.
[0200] Step 3:
[0201] The server uses emotion detection functionality based on identified intent to identify the emotional state contained in the input. This process involves data calculations using an emotion model, and the emotional state of the input is output in text format.
[0202] Step 4:
[0203] The server references past conversation history and pre-configured information, and uses a generative AI model to generate personalized responses. It leverages prompts to create optimized responses. The output at this stage is individually customized response text.
[0204] Step 5:
[0205] The server sends the generated response to the individual via communication equipment. At this time, speech synthesis or display corresponding to the response is performed. The user receives the response and completes the interaction with the system.
[0206] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0207] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0208] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0209] [Second Embodiment]
[0210] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0211] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0212] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0213] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0214] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0215] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0216] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0217] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0218] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0219] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0220] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0221] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0222] This invention realizes an interactive system to support the lives of the elderly and provide peace of mind to their families. The components of the system and their interactions are described below.
[0223] Executed on the server
[0224] The server is responsible for the central processing of the system. It receives voice or text input from the user and analyzes that data. The server uses natural language processing technology to understand the user's intent from their statements. Based on that intent, it has the function of referencing relevant past conversation history and schedule data to generate the optimal response.
[0225] For example, if a user says, "I'd like some advice on preparing for a trip I have planned for this weekend," the server might respond with something like, "We can provide you with a list of items you've brought on past trips and weather information for your destination."
[0226] Implementation on a device
[0227] The terminal is a device that sends user input to a server and presents the server's response to the user. It uses speech recognition technology to convert speech to text, or sends the data to the server as text input. After receiving the response from the server, the terminal presents it to the user in either text or audio format. Furthermore, it has the capability to acquire the user's location information, enabling the provision of responses tailored to specific situations.
[0228] Specific example: If a user asks their device, "What are some good restaurants in my neighborhood?", the device will display a list of recommended restaurants from the server based on the user's current location.
[0229] User interaction
[0230] Users interact with the system daily through their devices. Based on their spoken words and input information, they can initiate new conversations or request advice on schedules and health. Furthermore, the system periodically monitors the user's behavior and, if it detects any abnormalities, sends notifications to designated family members, thus also playing a monitoring role.
[0231] Specific example: If a user says "I'm going for a walk" at the same time every day, the server will notify the family if this suddenly stops happening, as this is considered an anomaly.
[0232] This invention aims to realize a form that supports independent living by having these components work together to provide personalized support to the user.
[0233] The following describes the processing flow.
[0234] Step 1:
[0235] The user enters inquiries or commands into the device via voice or text. For example, the user might say, "Tell me what's on my schedule this afternoon."
[0236] Step 2:
[0237] The terminal converts the input speech into text data. This conversion is performed using speech recognition technology. The converted text data is then sent to the server.
[0238] Step 3:
[0239] The server analyzes the received text data. It uses natural language processing technology to identify the user's intent and understand their purpose, such as "I want to know my schedule."
[0240] Step 4:
[0241] Based on the analysis, the server references the user's past conversation history and existing schedule databases. It extracts relevant information and generates an appropriate response.
[0242] Step 5:
[0243] The server sends the generated response to the terminal as text data. This includes specific information such as, "You have a health checkup today at 3 PM."
[0244] Step 6:
[0245] The terminal presents the user with response data received from the server. Information is provided by converting text to speech and reading it aloud, or by displaying it on the screen.
[0246] Step 7:
[0247] The device periodically monitors user behavior. If an anomaly is detected, such as an unexcused absence from an appointment, it notifies relevant parties via the server.
[0248] Through this series of processes, the system provides continuous and effective support to the user.
[0249] (Example 1)
[0250] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0251] Conventional interactive systems have limitations in providing a sense of security to the elderly because they are insufficient in analyzing the user's intent and generating personalized responses. Furthermore, they cannot adequately ensure user safety because they cannot detect abnormal behavior and promptly notify third parties.
[0252] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0253] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication device, means for analyzing the input and identifying the user's intent using natural language processing technology, means for generating a response by referring to past dialogue history and pre-configured data, means for optimizing the response using a generative model, and means for detecting abnormalities in the user's behavior and notifying a third party through a notification device. This enables personalized support and rapid safety checks for the user.
[0254] A "communication device" is a device that receives input from a user and exchanges data with a server.
[0255] "Natural language processing technology" is a technology that mechanically analyzes the language spoken by a user and understands their intent.
[0256] "Means for generating responses" refers to a function that creates a response by combining appropriate information based on the analyzed intent of the user.
[0257] A "generative model" is a pre-trained artificial intelligence model used to generate optimized outputs based on input data.
[0258] A "notification device" is a device used to notify a third party of the user's status or any abnormalities.
[0259] "Means for detecting abnormal behavior" refers to a function that identifies discrepancies between a user's normal behavioral patterns and their actions, and determines them to be abnormal.
[0260] This invention is an interactive system designed to support the lives of the elderly. It receives input from users via communication devices and generates and presents optimized responses to support their daily lives. The server analyzes the user's intent using natural language processing technology. Specifically, it uses a "speech recognition API" as speech recognition software to recognize speech as text. For analysis, it uses a "natural language processing model," which is a generative AI model, and an example of such a model is a "machine learning algorithm." Using this model, the system identifies the user's intent and generates responses by referring to past dialogue history and pre-configured data.
[0261] The device can use a "speech synthesis API" to present the generated response in voice or text. Furthermore, by incorporating location services, it can optimize the response based on the user's location using a "GPS module," etc. In addition, the device monitors the user's behavior and, if an anomaly is detected, sends warning information to a third party via a notification device. For example, a cloud monitoring service such as Amazon Web Services could be used.
[0262] Users can submit everyday inquiries to the system, with specific prompts such as "Please tell me the opening hours of the local pharmacy" or "Please give me advice on managing my health." This allows users to receive necessary information and advice while maintaining their independent lives.
[0263] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0264] Step 1:
[0265] The user makes a query in natural language via a communication device. The input data can be speech or text, and in some cases, a "speech recognition API" is used to convert speech to text. The output here is the content sent to the server as text data. As a concrete example, the user might say, "Tell me today's schedule."
[0266] Step 2:
[0267] The server analyzes the received text data. A generative AI model, a "natural language processing model," is used for the analysis, and "machine learning algorithms" are employed to identify the user's intent. The input is the user's text data, and the output is the identified intent. Specifically, the server recognizes the user's intent as a "schedule inquiry."
[0268] Step 3:
[0269] The server generates a response by referring to past conversation history and pre-configured data (e.g., schedule information). In this process, a generative AI model is used to optimize the content of the response. The input is the user's intent and reference data, and the output is the response content. As a concrete example, the server prepares a response such as "I have a hospital appointment at 2 PM."
[0270] Step 4:
[0271] The device receives the generated response and presents it to the user. The output can be in text or voice. In the case of voice output, the "Speech Synthesis API" can be used. Specifically, the device will inform the user in voice, "I'm going to the hospital at 2 PM."
[0272] Step 5:
[0273] The user decides on the next action based on the response received. Feedback may be needed, and the user can provide further instructions to the system. This step involves user input again, creating a loop in the system's processing. A specific action might be the user providing feedback such as "thank you."
[0274] (Application Example 1)
[0275] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0276] For elderly people to live independently, monitoring their health, reminding them of appointments, and ensuring their safety are crucial issues. In particular, a system is needed that can promptly provide support in the event of unexpected situations during daily activities.
[0277] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0278] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication device, means for identifying the user's intent using natural language processing technology, and means for checking health status and providing appointment reminders using a voice recognition device. This enhances the support that elderly people receive in their daily lives and ensures their safety through rapid notification in case of abnormalities.
[0279] A "communication device" is a device that receives voice or text input from a user and transmits it to a server.
[0280] "Natural language processing technology" is a technology for analyzing natural language data contained in voice or text and identifying the user's intention.
[0281] A "voice recognition device" is a device that converts the user's speech into digital data and texturizes the voice for input into natural language processing technology.
[0282] A "location information device" is a device that detects the user's geographical location, grasps the user's movement route by providing it to the server, and monitors safety.
[0283] A "generative AI model" is an artificial intelligence model generated by utilizing machine learning to improve the quality of responses to users.
[0284] A "notification device" is a device that reports to preset relevant persons when an abnormality related to the user's safety is detected.
[0285] This invention is for realizing an interactive system that supports the daily life of the elderly. The user inputs to the server by voice or text using a communication device. This communication device is often implemented on a smartphone, tablet, etc.
[0286] When the server receives the input voice or text data, it analyzes the data by making full use of natural language processing technology and identifies the user's intention. The main software used in this case is the "Google Cloud Natural Language API". This API includes machine learning algorithms and plays a role in improving the analysis accuracy. Also, the quality of responses is improved by using a generative AI model.
[0287] After the analysis is performed, the server refers to past conversation history and configured information to generate an appropriate response. This response is presented to the user via a speech recognition device, and the "Microsoft Azure Speech Service" is used for response generation, supporting the natural generation of speech.
[0288] Furthermore, the user's location information is obtained using the GPS function built into the smartphone. Based on this information, the server monitors the user's safety and, in the event of an anomaly, quickly sends an alert to the relevant parties via a notification device. This enhances security while simultaneously enabling more personalized support.
[0289] As a concrete example, the server can provide registered users with a reminder function that tells them "This is your schedule for today" by voice at a specified time every morning. Furthermore, in the event of a critical situation, support staff will be immediately notified. An example of a prompt to input into the generative AI model would be a sentence like, "Please suggest the optimal interaction method to support the lives of the elderly through natural language processing."
[0290] The system of this invention aims to support the independent living of the elderly by having these devices and software work together to provide advanced support and monitoring.
[0291] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0292] Step 1:
[0293] The user inputs information via voice or text into a communication device. This input can include natural language, such as "Tell me my schedule for tomorrow" or "Please record today's health data." The output is then sent to the server by the communication device as voice or text data.
[0294] Step 2:
[0295] The server uses a speech recognition device to convert the received audio data into text. Specifically, it uses the "Google Cloud Speech-to-Text API" to convert the audio data into text. The output of the process is obtained as natural language text.
[0296] Step 3:
[0297] The server analyzes the converted text data and uses natural language processing techniques to understand the user's intent. It utilizes the Google Cloud Natural Language API for analysis, extracting user requests from the text data. The input is text data, and the output is a command or request based on the user's intent.
[0298] Step 4:
[0299] The server generates responses by referencing the user's past conversation history and registered information. A generative AI model is used to create more personalized responses. The generated responses are constructed in natural language, supporting a deeper understanding of the user. Inputs are the user's intent and reference data, while output is the generated response.
[0300] Step 5:
[0301] The terminal receives the response data sent from the server and presents it to the user using a speech recognition device. Here, "Microsoft Azure Speech Service" is used to convert text data into speech. The input is text response data, and the output is a voice response.
[0302] Step 6:
[0303] To obtain the user's location information, the GPS function of the communication device is activated and the user's current location is sent to the server. The input is location information from the GPS sensor, and the output is sent to the server as location data.
[0304] Step 7:
[0305] The server monitors the location data and, if an abnormality is detected, immediately sends an alert to the relevant personnel through the notification device. This ensures the safety of the user. The input is location information and behavior patterns, and the output is an alert notification in case of an abnormality.
[0306] Furthermore, an emotion engine for estimating the user's emotion may be combined. That is, the specific processing unit 290 may estimate the user's emotion using the emotion identification model 59 and perform specific processing using the user's emotion.
[0307] The present invention realizes richer communication by supporting the daily conversations of the elderly and providing personalized responses based on emotions through a system incorporating an emotion engine. This system is configured as follows, and each element operates in cooperation.
[0308] Implementation on the server
[0309] The server receives voice or text data sent from the user and performs multi-stage analysis. First, it analyzes the intention of the input content using natural language processing technology. At the same time, it makes full use of the emotion engine to identify the emotional state contained in the input.
[0310] Specific example: When the user inputs "The weather is nice today and I'm happy", the server detects the emotion of "happy" and generates a positive response such as "It's a wonderful day today. Do you have any special plans?"
[0311] Implementation on the terminal
[0312] The terminal is a device that sends the user's input to the server and displays the response from the server to the user. Here, response adjustment according to the emotional state improves convenience and enriches the user experience. By combining functions such as voice recognition and acquisition of location information, information suitable for the user is provided.
[0313] Specific example: The device suggests to the user, "Are you looking for a relaxing cafe in your neighborhood?" and provides local information based on the user's mood to relax.
[0314] User interaction
[0315] Users can intuitively operate the device and receive helpful support while having fun. This feature is expected to improve the quality of life for the elderly and reduce the burden on families. Furthermore, the conversation flow is flexibly changed based on emotional recognition, always providing the most appropriate dialogue for the user.
[0316] Specific example: When a user sadly says, "I'm feeling down today," providing a caring response such as, "Is there anything I can do to help?" can help improve the user's mood.
[0317] This invention embodies a technological means that makes the lives of the elderly more comfortable and fulfilling by detecting emotions and enabling detailed responses based on those emotions.
[0318] The following describes the processing flow.
[0319] Step 1:
[0320] The user inputs their feelings or questions into the device via voice or text. For example, the user might say, "I'm feeling a little lonely today."
[0321] Step 2:
[0322] The terminal converts the input speech into text format. This conversion uses speech recognition technology. The converted text is then sent to the server.
[0323] Step 3:
[0324] The server analyzes the received text data. First, it uses natural language processing techniques to identify the user's intent and recognizes that they are "feeling lonely."
[0325] Step 4:
[0326] The server activates an emotion engine to analyze emotions from the input text. In this case, it detects the emotion "loneliness" and records the emotional state.
[0327] Step 5:
[0328] The server matches the user's past conversation history with changes in their emotions to optimize its response. In this case, it generates an appropriate response to alleviate feelings of loneliness, such as creating a suggestion like, "Shall we think of something fun to do together?"
[0329] Step 6:
[0330] The server sends the generated response to the terminal.
[0331] Step 7:
[0332] The terminal presents the user with the response received from the server. The response is either read aloud or displayed on the screen.
[0333] Step 8:
[0334] The device uses an emotion engine to continuously record emotional states for future user interactions, and prepares to notify pre-designated family members or relevant parties if abnormalities persist.
[0335] This allows the system to tailor its interactions to incorporate user emotions, providing more personalized support.
[0336] (Example 2)
[0337] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0338] In everyday conversations with the elderly, there is a challenge in improving the quality of communication by providing personalized responses based on emotions. Furthermore, conventional technologies have had difficulty accurately identifying the emotional state of the user and appropriately adapting the conversation.
[0339] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0340] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication system, means for analyzing the input and identifying the user's intentions and emotions using natural language processing technology, and means for generating emotion-based personalized responses using a generative AI model. This enables natural communication with elderly people.
[0341] A "communication system" is a general term for devices and technologies that use networks to send and receive voice and text data from users.
[0342] "Natural language processing technology" refers to technologies that enable computers to understand and analyze human language, including methods for analyzing the intent and emotions behind language.
[0343] "Intention" refers to the purpose or will of what a user is trying to convey through their statements or input.
[0344] "Emotions" refer to the mental state that users express through their words and input, and include psychological reactions such as joy, sadness, and surprise.
[0345] A "generative AI model" is an artificial intelligence technology that uses machine learning algorithms to generate natural-sounding words and sentences from data.
[0346] "Personalized responses" refer to conversations and actions that are individually tailored based on the user's specific intentions and feelings.
[0347] "Location information" refers to data that indicates a user's physical geographical location and is used in maps, navigation, and other applications.
[0348] "Dynamic adaptation" refers to a system automatically adjusting its responses and actions in response to changes in the user's state or environment.
[0349] This invention relates to a communication system incorporating an emotion engine, which aims to support the daily conversations of elderly people and provide personalized responses based on their emotions. This system includes three key elements: a server, a terminal, and a user.
[0350] The server receives natural language-based voice or text data from the user transmitted from the terminal via the communication system. The server processes this data and uses natural language processing techniques to identify the user's intent and emotions. The natural language processing techniques used here are based on machine learning algorithms, and it is possible to use generative AI models such as BERT or the GPT series. Once the emotional state is identified, the server uses the generative AI model to generate a personalized response based on that emotion.
[0351] The terminal is a device that sends user input to a server, receives a response from the server, and presents it to the user. In this process, speech recognition and speech synthesis technologies are utilized to provide both data input and responses to the user in a natural manner. Specific hardware examples include smart devices equipped with speech recognition capabilities, and general-purpose voice playback applications can be used as software.
[0352] Users can interact with the system intuitively through their devices and receive support. This allows them to receive advice and information tailored to their emotions, enabling them to enjoy a richer experience even in everyday conversations.
[0353] For example, if a user says, "I'm happy the weather is nice today," the server analyzes this as text data and detects the emotion of "happiness." Based on this, the server generates a positive response such as, "It looks like it's going to be a great day, is there anything you'd like to do?" and the terminal presents it to the user. An example of this prompt message is "generating an appropriate response when an elderly person expresses a positive mood."
[0354] This system can improve the quality of life for the elderly and enable flexible and effective communication that responds to their emotions.
[0355] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0356] Step 1:
[0357] The user enters a message into the terminal via voice or text. The terminal uses speech recognition technology to convert the input voice into a digital format. If the input is text, it is processed as is and prepared to be sent to the server. At this stage, the input is the user's natural language message, which the terminal converts into digital data.
[0358] Step 2:
[0359] The device sends digital audio or text data received from the user to the server. After receiving this data, the server begins data analysis using natural language processing technology. The input data is text in string format, which the server tokenizes and analyzes the meaning and structure of the sentences using machine learning algorithms. As a result, the user's intent and emotions are identified.
[0360] Step 3:
[0361] Based on the analysis results, the server initiates a process to generate a response using a generative AI model. Here, the emotion engine contributes by inputting the most appropriate prompt sentence for the user's emotion into the generative AI model. For example, if the emotion "happy" is detected, a positive response will be generated. The output is a sentence that serves as the response to the user.
[0362] Step 4:
[0363] The response generated by the server is sent to the terminal and ready to be presented to the user. The terminal converts the received text data into speech output using speech synthesis technology, or displays it to the user as text. At this stage, the output is either speech or text that the user can recognize.
[0364] Step 5:
[0365] The user can continue the conversation by acknowledging the received response and making new inputs. Through this two-way interaction, the system continues the dialogue in accordance with the user's emotional state and intentions, achieving adaptive communication.
[0366] (Application Example 2)
[0367] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0368] In the daily lives of the elderly, a lack of communication and feelings of loneliness are serious problems. Conventional technologies cannot provide flexible and personalized responses based on emotions, making it difficult to provide appropriate support according to individual emotional states. As a result, there is insufficient suggestion of dialogues and activities that address the latent emotions and intentions of the elderly, hindering the improvement of their quality of life.
[0369] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0370] In this invention, the server includes means for receiving natural language voice or text input from an individual via a communication device, means for analyzing the input and identifying the individual's intentions using natural language processing technology, and means for identifying the individual's emotional state using emotion detection functionality. This makes it possible to accurately analyze an individual's emotional state and intentions and provide personalized responses and activity suggestions based on them.
[0371] A "communication device" is a device that receives natural language voice or text input from an individual and exchanges information with a server.
[0372] "Natural language processing technology" refers to techniques for analyzing individual speech and text to identify their intent and meaning, and includes machine learning algorithms.
[0373] The "emotion detection function" is a function that identifies and analyzes the emotional state contained in an individual's statements and input content.
[0374] "Past conversation history" refers to a record of conversations and interactions an individual has had in the past, and is information that is referenced when generating responses.
[0375] A "notification device" is a device that transmits information to relevant parties when abnormal behavior of an individual is detected.
[0376] A "personalized response" is a response generated based on an individual's emotional state and intentions, and is a form of dialogue with content that is individually applied.
[0377] "Activity suggestions" are proposals aimed at improving the quality of life by suggesting optimal actions and activities according to an individual's emotional state.
[0378] The system of this invention consists of a communication device used by individuals, including the elderly, an emotion analysis system via a server, and a notification device that notifies relevant parties. The main role of the system is to analyze natural language speech and text input obtained from individuals to understand their intentions and emotional state.
[0379] The server receives input via communication devices and uses natural language processing techniques to identify the individual's intentions. It utilizes machine learning algorithms to precisely analyze the input. Furthermore, it uses emotion detection to identify the individual's emotional state and generates personalized responses by comparing them with past conversation history. This entire process can utilize a common cloud-based platform.
[0380] The generated response is presented to the individual via a communication device. For example, if an elderly person inputs "I'm lonely today," the system will offer a hospitable suggestion such as, "Why don't you talk to a friend on the phone? Or would you like to watch a movie together?"
[0381] Furthermore, the system can improve an individual's quality of life by suggesting activities based on their emotional state. An example of a prompt might be: "If the user wants to relax, generate a response suggesting relaxing facilities in the area."
[0382] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0383] Step 1:
[0384] The server receives natural language speech or text input from individuals via communication devices. This input is taken into the system as raw data.
[0385] Step 2:
[0386] The server analyzes the received input using natural language processing techniques. This includes grammatical analysis and keyword extraction of the input content, as well as data processing to identify the individual's intent. The output is text data with the identified intent.
[0387] Step 3:
[0388] The server uses emotion detection functionality based on identified intent to identify the emotional state contained in the input. This process involves data calculations using an emotion model, and the emotional state of the input is output in text format.
[0389] Step 4:
[0390] The server references past conversation history and pre-configured information, and uses a generative AI model to generate personalized responses. It leverages prompts to create optimized responses. The output at this stage is individually customized response text.
[0391] Step 5:
[0392] The server sends the generated response to the individual via communication equipment. At this time, speech synthesis or display corresponding to the response is performed. The user receives the response and completes the interaction with the system.
[0393] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0394] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0395] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0396] [Third Embodiment]
[0397] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0398] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0399] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0400] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0401] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0402] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0403] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0404] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0405] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0406] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0407] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0408] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0409] This invention realizes an interactive system to support the lives of the elderly and provide peace of mind to their families. The components of the system and their interactions are described below.
[0410] Executed on the server
[0411] The server is responsible for the central processing of the system. It receives voice or text input from the user and analyzes that data. The server uses natural language processing technology to understand the user's intent from their statements. Based on that intent, it has the function of referencing relevant past conversation history and schedule data to generate the optimal response.
[0412] For example, if a user says, "I'd like some advice on preparing for a trip I have planned for this weekend," the server might respond with something like, "We can provide you with a list of items you've brought on past trips and weather information for your destination."
[0413] Implementation on a device
[0414] The terminal is a device that sends user input to a server and presents the server's response to the user. It uses speech recognition technology to convert speech to text, or sends the data to the server as text input. After receiving the response from the server, the terminal presents it to the user in either text or audio format. Furthermore, it has the capability to acquire the user's location information, enabling the provision of responses tailored to specific situations.
[0415] Specific example: If a user asks their device, "What are some good restaurants in my neighborhood?", the device will display a list of recommended restaurants from the server based on the user's current location.
[0416] User interaction
[0417] Users interact with the system daily through their devices. Based on their spoken words and input information, they can initiate new conversations or request advice on schedules and health. Furthermore, the system periodically monitors the user's behavior and, if it detects any abnormalities, sends notifications to designated family members, thus also playing a monitoring role.
[0418] Specific example: If a user says "I'm going for a walk" at the same time every day, the server will notify the family if this suddenly stops happening, as this is considered an anomaly.
[0419] This invention aims to realize a form that supports independent living by having these components work together to provide personalized support to the user.
[0420] The following describes the processing flow.
[0421] Step 1:
[0422] The user enters inquiries or commands into the device via voice or text. For example, the user might say, "Tell me what's on my schedule this afternoon."
[0423] Step 2:
[0424] The terminal converts the input speech into text data. This conversion is performed using speech recognition technology. The converted text data is then sent to the server.
[0425] Step 3:
[0426] The server analyzes the received text data. It uses natural language processing technology to identify the user's intent and understand their purpose, such as "I want to know my schedule."
[0427] Step 4:
[0428] Based on the analysis, the server references the user's past conversation history and existing schedule databases. It extracts relevant information and generates an appropriate response.
[0429] Step 5:
[0430] The server sends the generated response to the terminal as text data. This includes specific information such as, "You have a health checkup today at 3 PM."
[0431] Step 6:
[0432] The terminal presents the user with response data received from the server. Information is provided by converting text to speech and reading it aloud, or by displaying it on the screen.
[0433] Step 7:
[0434] The device periodically monitors user behavior. If an anomaly is detected, such as an unexcused absence from an appointment, it notifies relevant parties via the server.
[0435] Through this series of processes, the system provides continuous and effective support to the user.
[0436] (Example 1)
[0437] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0438] Conventional interactive systems have limitations in providing a sense of security to the elderly because they are insufficient in analyzing the user's intent and generating personalized responses. Furthermore, they cannot adequately ensure user safety because they cannot detect abnormal behavior and promptly notify third parties.
[0439] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0440] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication device, means for analyzing the input and identifying the user's intent using natural language processing technology, means for generating a response by referring to past dialogue history and pre-configured data, means for optimizing the response using a generative model, and means for detecting abnormalities in the user's behavior and notifying a third party through a notification device. This enables personalized support and rapid safety checks for the user.
[0441] A "communication device" is a device that receives input from a user and exchanges data with a server.
[0442] "Natural language processing technology" is a technology that mechanically analyzes the language spoken by a user and understands their intent.
[0443] "Means for generating responses" refers to a function that creates a response by combining appropriate information based on the analyzed intent of the user.
[0444] A "generative model" is a pre-trained artificial intelligence model used to generate optimized outputs based on input data.
[0445] A "notification device" is a device used to notify a third party of the user's status or any abnormalities.
[0446] "Means for detecting abnormal behavior" refers to a function that identifies discrepancies between a user's normal behavioral patterns and their actions, and determines them to be abnormal.
[0447] This invention is an interactive system designed to support the lives of the elderly. It receives input from users via communication devices and generates and presents optimized responses to support their daily lives. The server analyzes the user's intent using natural language processing technology. Specifically, it uses a "speech recognition API" as speech recognition software to recognize speech as text. For analysis, it uses a "natural language processing model," which is a generative AI model, and an example of such a model is a "machine learning algorithm." Using this model, the system identifies the user's intent and generates responses by referring to past dialogue history and pre-configured data.
[0448] The device can use a "speech synthesis API" to present the generated response in voice or text. Furthermore, by incorporating location services, it can optimize the response based on the user's location using a "GPS module," etc. In addition, the device monitors the user's behavior and, if an anomaly is detected, sends warning information to a third party via a notification device. For example, a cloud monitoring service such as Amazon Web Services could be used.
[0449] Users can submit everyday inquiries to the system, with specific prompts such as "Please tell me the opening hours of the local pharmacy" or "Please give me advice on managing my health." This allows users to receive necessary information and advice while maintaining their independent lives.
[0450] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0451] Step 1:
[0452] The user makes a query in natural language via a communication device. The input data can be speech or text, and in some cases, a "speech recognition API" is used to convert speech to text. The output here is the content sent to the server as text data. As a concrete example, the user might say, "Tell me today's schedule."
[0453] Step 2:
[0454] The server analyzes the received text data. A generative AI model, a "natural language processing model," is used for the analysis, and "machine learning algorithms" are employed to identify the user's intent. The input is the user's text data, and the output is the identified intent. Specifically, the server recognizes the user's intent as a "schedule inquiry."
[0455] Step 3:
[0456] The server generates a response by referring to past conversation history and pre-configured data (e.g., schedule information). In this process, a generative AI model is used to optimize the content of the response. The input is the user's intent and reference data, and the output is the response content. As a concrete example, the server prepares a response such as "I have a hospital appointment at 2 PM."
[0457] Step 4:
[0458] The device receives the generated response and presents it to the user. The output can be in text or voice. In the case of voice output, the "Speech Synthesis API" can be used. Specifically, the device will inform the user in voice, "I'm going to the hospital at 2 PM."
[0459] Step 5:
[0460] The user decides on the next action based on the response received. Feedback may be needed, and the user can provide further instructions to the system. This step involves user input again, creating a loop in the system's processing. A specific action might be the user providing feedback such as "thank you."
[0461] (Application Example 1)
[0462] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0463] For elderly people to live independently, monitoring their health, reminding them of appointments, and ensuring their safety are crucial issues. In particular, a system is needed that can promptly provide support in the event of unexpected situations during daily activities.
[0464] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0465] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication device, means for identifying the user's intent using natural language processing technology, and means for checking health status and providing appointment reminders using a voice recognition device. This enhances the support that elderly people receive in their daily lives and ensures their safety through rapid notification in case of abnormalities.
[0466] A "communication device" is a device that receives voice or text input from a user and transmits it to a server.
[0467] "Natural language processing technology" is a technology that analyzes natural language data contained in speech or text to identify the user's intent.
[0468] A "speech recognition device" is a device that converts a user's speech into digital data and then converts the speech into text for input into natural language processing technology.
[0469] A "location information device" is a device that detects the geographical location of a user and provides that information to a server, thereby understanding the user's movements and monitoring their safety.
[0470] A "generative AI model" is an artificial intelligence model that is generated using machine learning to improve the quality of responses to users.
[0471] A "notification device" is a device that reports to pre-configured relevant parties when an anomaly related to the user's safety is detected.
[0472] This invention aims to realize an interactive system that supports the daily lives of the elderly. Users input information to the server via voice or text using a communication device. This communication device is often implemented using a smartphone or tablet.
[0473] When the server receives input voice or text data, it uses natural language processing technology to analyze the data and identify the user's intent. The primary software used in this process is the "Google Cloud Natural Language API." This API includes machine learning algorithms that enhance the accuracy of the analysis. Furthermore, it uses generative AI models to improve the quality of the response.
[0474] After the analysis is performed, the server refers to past conversation history and configured information to generate an appropriate response. This response is presented to the user via a speech recognition device, and the "Microsoft Azure Speech Service" is used for response generation, supporting the natural generation of speech.
[0475] Furthermore, the user's location information is obtained using the GPS function built into the smartphone. Based on this information, the server monitors the user's safety and, in the event of an anomaly, quickly sends an alert to the relevant parties via a notification device. This enhances security while simultaneously enabling more personalized support.
[0476] As a concrete example, the server can provide registered users with a reminder function that tells them "This is your schedule for today" by voice at a specified time every morning. Furthermore, in the event of a critical situation, support staff will be immediately notified. An example of a prompt to input into the generative AI model would be a sentence like, "Please suggest the optimal interaction method to support the lives of the elderly through natural language processing."
[0477] The system of this invention aims to support the independent living of the elderly by having these devices and software work together to provide advanced support and monitoring.
[0478] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0479] Step 1:
[0480] The user inputs information via voice or text into a communication device. This input can include natural language, such as "Tell me my schedule for tomorrow" or "Please record today's health data." The output is then sent to the server by the communication device as voice or text data.
[0481] Step 2:
[0482] The server uses a speech recognition device to convert the received audio data into text. Specifically, it uses the "Google Cloud Speech-to-Text API" to convert the audio data into text. The output of the process is obtained as natural language text.
[0483] Step 3:
[0484] The server analyzes the converted text data and uses natural language processing techniques to understand the user's intent. It utilizes the Google Cloud Natural Language API for analysis, extracting user requests from the text data. The input is text data, and the output is a command or request based on the user's intent.
[0485] Step 4:
[0486] The server generates responses by referencing the user's past conversation history and registered information. A generative AI model is used to create more personalized responses. The generated responses are constructed in natural language, supporting a deeper understanding of the user. Inputs are the user's intent and reference data, while output is the generated response.
[0487] Step 5:
[0488] The terminal receives the response data sent from the server and presents it to the user using a speech recognition device. Here, "Microsoft Azure Speech Service" is used to convert text data into speech. The input is text response data, and the output is a voice response.
[0489] Step 6:
[0490] To obtain the user's location information, the GPS function of the communication device is activated and the user's current location is sent to the server. The input is location information from the GPS sensor, and the output is sent to the server as location data.
[0491] Step 7:
[0492] The server monitors location data and, if it detects an anomaly, immediately sends an alert to the relevant parties via a notification device. This ensures user safety. Inputs are location information and behavioral patterns, and output is an alert notification in case of an anomaly.
[0493] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0494] This invention aims to support the daily conversations of the elderly and provide personalized responses based on emotions, thereby enabling richer communication through a system incorporating an emotion engine. The system is configured as follows, with each element working in conjunction with the others.
[0495] Executed on the server
[0496] The server receives voice or text data sent by the user and performs multi-stage analysis. First, it analyzes the intent of the input using natural language processing techniques. Simultaneously, it uses an emotion engine to identify the emotional state contained in the input.
[0497] Specific example: If a user types "I'm happy because the weather is nice today," the server detects the emotion of "happiness" and generates a positive response such as, "It's a wonderful day today, do you have anything special planned?"
[0498] Implementation on a device
[0499] The terminal is a device that sends user input to a server and displays the server's response to the user. Here, adjusting the response according to the user's emotional state improves convenience and enriches the user experience. By combining voice recognition and location information acquisition functions, information tailored to the user is provided.
[0500] Specific example: The device suggests to the user, "Are you looking for a relaxing cafe in your neighborhood?" and provides local information based on the user's mood to relax.
[0501] User interaction
[0502] Users can intuitively operate the device and receive helpful support while having fun. This feature is expected to improve the quality of life for the elderly and reduce the burden on families. Furthermore, the conversation flow is flexibly changed based on emotional recognition, always providing the most appropriate dialogue for the user.
[0503] Specific example: When a user sadly says, "I'm feeling down today," providing a caring response such as, "Is there anything I can do to help?" can help improve the user's mood.
[0504] This invention embodies a technological means that makes the lives of the elderly more comfortable and fulfilling by detecting emotions and enabling detailed responses based on those emotions.
[0505] The following describes the processing flow.
[0506] Step 1:
[0507] The user inputs their feelings or questions into the device via voice or text. For example, the user might say, "I'm feeling a little lonely today."
[0508] Step 2:
[0509] The terminal converts the input speech into text format. This conversion uses speech recognition technology. The converted text is then sent to the server.
[0510] Step 3:
[0511] The server analyzes the received text data. First, it uses natural language processing techniques to identify the user's intent and recognizes that they are "feeling lonely."
[0512] Step 4:
[0513] The server activates an emotion engine to analyze emotions from the input text. In this case, it detects the emotion "loneliness" and records the emotional state.
[0514] Step 5:
[0515] The server matches the user's past conversation history with changes in their emotions to optimize its response. In this case, it generates an appropriate response to alleviate feelings of loneliness, such as creating a suggestion like, "Shall we think of something fun to do together?"
[0516] Step 6:
[0517] The server sends the generated response to the terminal.
[0518] Step 7:
[0519] The terminal presents the user with the response received from the server. The response is either read aloud or displayed on the screen.
[0520] Step 8:
[0521] The device uses an emotion engine to continuously record emotional states for future user interactions, and prepares to notify pre-designated family members or relevant parties if abnormalities persist.
[0522] This allows the system to tailor its interactions to incorporate user emotions, providing more personalized support.
[0523] (Example 2)
[0524] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0525] In everyday conversations with the elderly, there is a challenge in improving the quality of communication by providing personalized responses based on emotions. Furthermore, conventional technologies have had difficulty accurately identifying the emotional state of the user and appropriately adapting the conversation.
[0526] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0527] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication system, means for analyzing the input and identifying the user's intentions and emotions using natural language processing technology, and means for generating emotion-based personalized responses using a generative AI model. This enables natural communication with elderly people.
[0528] A "communication system" is a general term for devices and technologies that use networks to send and receive voice and text data from users.
[0529] "Natural language processing technology" refers to technologies that enable computers to understand and analyze human language, including methods for analyzing the intent and emotions behind language.
[0530] "Intention" refers to the purpose or will of what a user is trying to convey through their statements or input.
[0531] "Emotions" refer to the mental state that users express through their words and input, and include psychological reactions such as joy, sadness, and surprise.
[0532] A "generative AI model" is an artificial intelligence technology that uses machine learning algorithms to generate natural-sounding words and sentences from data.
[0533] "Personalized responses" refer to conversations and actions that are individually tailored based on the user's specific intentions and feelings.
[0534] "Location information" refers to data that indicates a user's physical geographical location and is used in maps, navigation, and other applications.
[0535] "Dynamic adaptation" refers to a system automatically adjusting its responses and actions in response to changes in the user's state or environment.
[0536] This invention relates to a communication system incorporating an emotion engine, which aims to support the daily conversations of elderly people and provide personalized responses based on their emotions. This system includes three key elements: a server, a terminal, and a user.
[0537] The server receives natural language-based voice or text data from the user transmitted from the terminal via the communication system. The server processes this data and uses natural language processing techniques to identify the user's intent and emotions. The natural language processing techniques used here are based on machine learning algorithms, and it is possible to use generative AI models such as BERT or the GPT series. Once the emotional state is identified, the server uses the generative AI model to generate a personalized response based on that emotion.
[0538] The terminal is a device that sends user input to a server, receives a response from the server, and presents it to the user. In this process, speech recognition and speech synthesis technologies are utilized to provide both data input and responses to the user in a natural manner. Specific hardware examples include smart devices equipped with speech recognition capabilities, and general-purpose voice playback applications can be used as software.
[0539] Users can interact with the system intuitively through their devices and receive support. This allows them to receive advice and information tailored to their emotions, enabling them to enjoy a richer experience even in everyday conversations.
[0540] For example, if a user says, "I'm happy the weather is nice today," the server analyzes this as text data and detects the emotion of "happiness." Based on this, the server generates a positive response such as, "It looks like it's going to be a great day, is there anything you'd like to do?" and the terminal presents it to the user. An example of this prompt message is "generating an appropriate response when an elderly person expresses a positive mood."
[0541] This system can improve the quality of life for the elderly and enable flexible and effective communication that responds to their emotions.
[0542] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0543] Step 1:
[0544] The user enters a message into the terminal via voice or text. The terminal uses speech recognition technology to convert the input voice into a digital format. If the input is text, it is processed as is and prepared to be sent to the server. At this stage, the input is the user's natural language message, which the terminal converts into digital data.
[0545] Step 2:
[0546] The device sends digital audio or text data received from the user to the server. After receiving this data, the server begins data analysis using natural language processing technology. The input data is text in string format, which the server tokenizes and analyzes the meaning and structure of the sentences using machine learning algorithms. As a result, the user's intent and emotions are identified.
[0547] Step 3:
[0548] Based on the analysis results, the server initiates a process to generate a response using a generative AI model. Here, the emotion engine contributes by inputting the most appropriate prompt sentence for the user's emotion into the generative AI model. For example, if the emotion "happy" is detected, a positive response will be generated. The output is a sentence that serves as the response to the user.
[0549] Step 4:
[0550] The response generated by the server is sent to the terminal and ready to be presented to the user. The terminal converts the received text data into speech output using speech synthesis technology, or displays it to the user as text. At this stage, the output is either speech or text that the user can recognize.
[0551] Step 5:
[0552] The user can continue the conversation by acknowledging the received response and making new inputs. Through this two-way interaction, the system continues the dialogue in accordance with the user's emotional state and intentions, achieving adaptive communication.
[0553] (Application Example 2)
[0554] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0555] In the daily lives of the elderly, a lack of communication and feelings of loneliness are serious problems. Conventional technologies cannot provide flexible and personalized responses based on emotions, making it difficult to provide appropriate support according to individual emotional states. As a result, there is insufficient suggestion of dialogues and activities that address the latent emotions and intentions of the elderly, hindering the improvement of their quality of life.
[0556] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0557] In this invention, the server includes means for receiving natural language voice or text input from an individual via a communication device, means for analyzing the input and identifying the individual's intentions using natural language processing technology, and means for identifying the individual's emotional state using emotion detection functionality. This makes it possible to accurately analyze an individual's emotional state and intentions and provide personalized responses and activity suggestions based on them.
[0558] A "communication device" is a device that receives natural language voice or text input from an individual and exchanges information with a server.
[0559] "Natural language processing technology" refers to techniques for analyzing individual speech and text to identify their intent and meaning, and includes machine learning algorithms.
[0560] The "emotion detection function" is a function that identifies and analyzes the emotional state contained in an individual's statements and input content.
[0561] "Past conversation history" refers to a record of conversations and interactions an individual has had in the past, and is information that is referenced when generating responses.
[0562] A "notification device" is a device that transmits information to relevant parties when abnormal behavior of an individual is detected.
[0563] A "personalized response" is a response generated based on an individual's emotional state and intentions, and is a form of dialogue with content that is individually applied.
[0564] "Activity suggestions" are proposals aimed at improving the quality of life by suggesting optimal actions and activities according to an individual's emotional state.
[0565] The system of this invention consists of a communication device used by individuals, including the elderly, an emotion analysis system via a server, and a notification device that notifies relevant parties. The main role of the system is to analyze natural language speech and text input obtained from individuals to understand their intentions and emotional state.
[0566] The server receives input via communication devices and uses natural language processing techniques to identify the individual's intentions. It utilizes machine learning algorithms to precisely analyze the input. Furthermore, it uses emotion detection to identify the individual's emotional state and generates personalized responses by comparing them with past conversation history. This entire process can utilize a common cloud-based platform.
[0567] The generated response is presented to the individual via a communication device. For example, if an elderly person inputs "I'm lonely today," the system will offer a hospitable suggestion such as, "Why don't you talk to a friend on the phone? Or would you like to watch a movie together?"
[0568] Furthermore, the system can improve an individual's quality of life by suggesting activities based on their emotional state. An example of a prompt might be: "If the user wants to relax, generate a response suggesting relaxing facilities in the area."
[0569] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0570] Step 1:
[0571] The server receives natural language speech or text input from individuals via communication devices. This input is taken into the system as raw data.
[0572] Step 2:
[0573] The server analyzes the received input using natural language processing techniques. This includes grammatical analysis and keyword extraction of the input content, as well as data processing to identify the individual's intent. The output is text data with the identified intent.
[0574] Step 3:
[0575] The server uses emotion detection functionality based on identified intent to identify the emotional state contained in the input. This process involves data calculations using an emotion model, and the emotional state of the input is output in text format.
[0576] Step 4:
[0577] The server references past conversation history and pre-configured information, and uses a generative AI model to generate personalized responses. It leverages prompts to create optimized responses. The output at this stage is individually customized response text.
[0578] Step 5:
[0579] The server sends the generated response to the individual via communication equipment. At this time, speech synthesis or display corresponding to the response is performed. The user receives the response and completes the interaction with the system.
[0580] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0581] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0582] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0583] [Fourth Embodiment]
[0584] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0585] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0586] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0587] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0588] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0589] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0590] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0591] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0592] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0593] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0594] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0595] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0596] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0597] This invention realizes an interactive system to support the lives of the elderly and provide peace of mind to their families. The components of the system and their interactions are described below.
[0598] Executed on the server
[0599] The server is responsible for the central processing of the system. It receives voice or text input from the user and analyzes that data. The server uses natural language processing technology to understand the user's intent from their statements. Based on that intent, it has the function of referencing relevant past conversation history and schedule data to generate the optimal response.
[0600] For example, if a user says, "I'd like some advice on preparing for a trip I have planned for this weekend," the server might respond with something like, "We can provide you with a list of items you've brought on past trips and weather information for your destination."
[0601] Implementation on a device
[0602] The terminal is a device that sends user input to a server and presents the server's response to the user. It uses speech recognition technology to convert speech to text, or sends the data to the server as text input. After receiving the response from the server, the terminal presents it to the user in either text or audio format. Furthermore, it has the capability to acquire the user's location information, enabling the provision of responses tailored to specific situations.
[0603] Specific example: If a user asks their device, "What are some good restaurants in my neighborhood?", the device will display a list of recommended restaurants from the server based on the user's current location.
[0604] User interaction
[0605] Users interact with the system daily through their devices. Based on their spoken words and input information, they can initiate new conversations or request advice on schedules and health. Furthermore, the system periodically monitors the user's behavior and, if it detects any abnormalities, sends notifications to designated family members, thus also playing a monitoring role.
[0606] Specific example: If a user says "I'm going for a walk" at the same time every day, the server will notify the family if this suddenly stops happening, as this is considered an anomaly.
[0607] This invention aims to realize a form that supports independent living by having these components work together to provide personalized support to the user.
[0608] The following describes the processing flow.
[0609] Step 1:
[0610] The user enters inquiries or commands into the device via voice or text. For example, the user might say, "Tell me what's on my schedule this afternoon."
[0611] Step 2:
[0612] The terminal converts the input speech into text data. This conversion is performed using speech recognition technology. The converted text data is then sent to the server.
[0613] Step 3:
[0614] The server analyzes the received text data. It uses natural language processing technology to identify the user's intent and understand their purpose, such as "I want to know my schedule."
[0615] Step 4:
[0616] Based on the analysis, the server references the user's past conversation history and existing schedule databases. It extracts relevant information and generates an appropriate response.
[0617] Step 5:
[0618] The server sends the generated response to the terminal as text data. This includes specific information such as, "You have a health checkup today at 3 PM."
[0619] Step 6:
[0620] The terminal presents the user with response data received from the server. Information is provided by converting text to speech and reading it aloud, or by displaying it on the screen.
[0621] Step 7:
[0622] The device periodically monitors user behavior. If an anomaly is detected, such as an unexcused absence from an appointment, it notifies relevant parties via the server.
[0623] Through this series of processes, the system provides continuous and effective support to the user.
[0624] (Example 1)
[0625] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0626] Conventional interactive systems have limitations in providing a sense of security to the elderly because they are insufficient in analyzing the user's intent and generating personalized responses. Furthermore, they cannot adequately ensure user safety because they cannot detect abnormal behavior and promptly notify third parties.
[0627] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0628] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication device, means for analyzing the input and identifying the user's intent using natural language processing technology, means for generating a response by referring to past dialogue history and pre-configured data, means for optimizing the response using a generative model, and means for detecting abnormalities in the user's behavior and notifying a third party through a notification device. This enables personalized support and rapid safety checks for the user.
[0629] A "communication device" is a device that receives input from a user and exchanges data with a server.
[0630] "Natural language processing technology" is a technology that mechanically analyzes the language spoken by a user and understands their intent.
[0631] "Means for generating responses" refers to a function that creates a response by combining appropriate information based on the analyzed intent of the user.
[0632] A "generative model" is a pre-trained artificial intelligence model used to generate optimized outputs based on input data.
[0633] A "notification device" is a device used to notify a third party of the user's status or any abnormalities.
[0634] "Means for detecting abnormal behavior" refers to a function that identifies discrepancies between a user's normal behavioral patterns and their actions, and determines them to be abnormal.
[0635] This invention is an interactive system designed to support the lives of the elderly. It receives input from users via communication devices and generates and presents optimized responses to support their daily lives. The server analyzes the user's intent using natural language processing technology. Specifically, it uses a "speech recognition API" as speech recognition software to recognize speech as text. For analysis, it uses a "natural language processing model," which is a generative AI model, and an example of such a model is a "machine learning algorithm." Using this model, the system identifies the user's intent and generates responses by referring to past dialogue history and pre-configured data.
[0636] The device can use a "speech synthesis API" to present the generated response in voice or text. Furthermore, by incorporating location services, it can optimize the response based on the user's location using a "GPS module," etc. In addition, the device monitors the user's behavior and, if an anomaly is detected, sends warning information to a third party via a notification device. For example, a cloud monitoring service such as Amazon Web Services could be used.
[0637] Users can submit everyday inquiries to the system, with specific prompts such as "Please tell me the opening hours of the local pharmacy" or "Please give me advice on managing my health." This allows users to receive necessary information and advice while maintaining their independent lives.
[0638] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0639] Step 1:
[0640] The user makes a query in natural language via a communication device. The input data can be speech or text, and in some cases, a "speech recognition API" is used to convert speech to text. The output here is the content sent to the server as text data. As a concrete example, the user might say, "Tell me today's schedule."
[0641] Step 2:
[0642] The server analyzes the received text data. A generative AI model, a "natural language processing model," is used for the analysis, and "machine learning algorithms" are employed to identify the user's intent. The input is the user's text data, and the output is the identified intent. Specifically, the server recognizes the user's intent as a "schedule inquiry."
[0643] Step 3:
[0644] The server generates a response by referring to past conversation history and pre-configured data (e.g., schedule information). In this process, a generative AI model is used to optimize the content of the response. The input is the user's intent and reference data, and the output is the response content. As a concrete example, the server prepares a response such as "I have a hospital appointment at 2 PM."
[0645] Step 4:
[0646] The device receives the generated response and presents it to the user. The output can be in text or voice. In the case of voice output, the "Speech Synthesis API" can be used. Specifically, the device will inform the user in voice, "I'm going to the hospital at 2 PM."
[0647] Step 5:
[0648] The user decides on the next action based on the response received. Feedback may be needed, and the user can provide further instructions to the system. This step involves user input again, creating a loop in the system's processing. A specific action might be the user providing feedback such as "thank you."
[0649] (Application Example 1)
[0650] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0651] For elderly people to live independently, monitoring their health, reminding them of appointments, and ensuring their safety are crucial issues. In particular, a system is needed that can promptly provide support in the event of unexpected situations during daily activities.
[0652] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0653] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication device, means for identifying the user's intent using natural language processing technology, and means for checking health status and providing appointment reminders using a voice recognition device. This enhances the support that elderly people receive in their daily lives and ensures their safety through rapid notification in case of abnormalities.
[0654] A "communication device" is a device that receives voice or text input from a user and transmits it to a server.
[0655] "Natural language processing technology" is a technology that analyzes natural language data contained in speech or text to identify the user's intent.
[0656] A "speech recognition device" is a device that converts a user's speech into digital data and then converts the speech into text for input into natural language processing technology.
[0657] A "location information device" is a device that detects the geographical location of a user and provides that information to a server, thereby understanding the user's movements and monitoring their safety.
[0658] A "generative AI model" is an artificial intelligence model that is generated using machine learning to improve the quality of responses to users.
[0659] A "notification device" is a device that reports to pre-configured relevant parties when an anomaly related to the user's safety is detected.
[0660] This invention aims to realize an interactive system that supports the daily lives of the elderly. Users input information to the server via voice or text using a communication device. This communication device is often implemented using a smartphone or tablet.
[0661] When the server receives input voice or text data, it uses natural language processing technology to analyze the data and identify the user's intent. The primary software used in this process is the "Google Cloud Natural Language API." This API includes machine learning algorithms that enhance the accuracy of the analysis. Furthermore, it uses generative AI models to improve the quality of the response.
[0662] After the analysis is performed, the server refers to past conversation history and configured information to generate an appropriate response. This response is presented to the user via a speech recognition device, and the "Microsoft Azure Speech Service" is used for response generation, supporting the natural generation of speech.
[0663] Furthermore, the user's location information is obtained using the GPS function built into the smartphone. Based on this information, the server monitors the user's safety and, in the event of an anomaly, quickly sends an alert to the relevant parties via a notification device. This enhances security while simultaneously enabling more personalized support.
[0664] As a concrete example, the server can provide registered users with a reminder function that tells them "This is your schedule for today" by voice at a specified time every morning. Furthermore, in the event of a critical situation, support staff will be immediately notified. An example of a prompt to input into the generative AI model would be a sentence like, "Please suggest the optimal interaction method to support the lives of the elderly through natural language processing."
[0665] The system of this invention aims to support the independent living of the elderly by having these devices and software work together to provide advanced support and monitoring.
[0666] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0667] Step 1:
[0668] The user inputs information via voice or text into a communication device. This input can include natural language, such as "Tell me my schedule for tomorrow" or "Please record today's health data." The output is then sent to the server by the communication device as voice or text data.
[0669] Step 2:
[0670] The server uses a speech recognition device to convert the received audio data into text. Specifically, it uses the "Google Cloud Speech-to-Text API" to convert the audio data into text. The output of the process is obtained as natural language text.
[0671] Step 3:
[0672] The server analyzes the converted text data and uses natural language processing techniques to understand the user's intent. It utilizes the Google Cloud Natural Language API for analysis, extracting user requests from the text data. The input is text data, and the output is a command or request based on the user's intent.
[0673] Step 4:
[0674] The server generates responses by referencing the user's past conversation history and registered information. A generative AI model is used to create more personalized responses. The generated responses are constructed in natural language, supporting a deeper understanding of the user. Inputs are the user's intent and reference data, while output is the generated response.
[0675] Step 5:
[0676] The terminal receives the response data sent from the server and presents it to the user using a speech recognition device. Here, "Microsoft Azure Speech Service" is used to convert text data into speech. The input is text response data, and the output is a voice response.
[0677] Step 6:
[0678] To obtain the user's location information, the GPS function of the communication device is activated and the user's current location is sent to the server. The input is location information from the GPS sensor, and the output is sent to the server as location data.
[0679] Step 7:
[0680] The server monitors location data and, if it detects an anomaly, immediately sends an alert to the relevant parties via a notification device. This ensures user safety. Inputs are location information and behavioral patterns, and output is an alert notification in case of an anomaly.
[0681] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0682] This invention aims to support the daily conversations of the elderly and provide personalized responses based on emotions, thereby enabling richer communication through a system incorporating an emotion engine. The system is configured as follows, with each element working in conjunction with the others.
[0683] Executed on the server
[0684] The server receives voice or text data sent by the user and performs multi-stage analysis. First, it analyzes the intent of the input using natural language processing techniques. Simultaneously, it uses an emotion engine to identify the emotional state contained in the input.
[0685] Specific example: If a user types "I'm happy because the weather is nice today," the server detects the emotion of "happiness" and generates a positive response such as, "It's a wonderful day today, do you have anything special planned?"
[0686] Implementation on a device
[0687] The terminal is a device that sends user input to a server and displays the server's response to the user. Here, adjusting the response according to the user's emotional state improves convenience and enriches the user experience. By combining voice recognition and location information acquisition functions, information tailored to the user is provided.
[0688] Specific example: The device suggests to the user, "Are you looking for a relaxing cafe in your neighborhood?" and provides local information based on the user's mood to relax.
[0689] User interaction
[0690] Users can intuitively operate the device and receive helpful support while having fun. This feature is expected to improve the quality of life for the elderly and reduce the burden on families. Furthermore, the conversation flow is flexibly changed based on emotional recognition, always providing the most appropriate dialogue for the user.
[0691] Specific example: When a user sadly says, "I'm feeling down today," providing a caring response such as, "Is there anything I can do to help?" can help improve the user's mood.
[0692] This invention embodies a technological means that makes the lives of the elderly more comfortable and fulfilling by detecting emotions and enabling detailed responses based on those emotions.
[0693] The following describes the processing flow.
[0694] Step 1:
[0695] The user inputs their feelings or questions into the device via voice or text. For example, the user might say, "I'm feeling a little lonely today."
[0696] Step 2:
[0697] The terminal converts the input speech into text format. This conversion uses speech recognition technology. The converted text is then sent to the server.
[0698] Step 3:
[0699] The server analyzes the received text data. First, it uses natural language processing techniques to identify the user's intent and recognizes that they are "feeling lonely."
[0700] Step 4:
[0701] The server activates an emotion engine to analyze emotions from the input text. In this case, it detects the emotion "loneliness" and records the emotional state.
[0702] Step 5:
[0703] The server matches the user's past conversation history with changes in their emotions to optimize its response. In this case, it generates an appropriate response to alleviate feelings of loneliness, such as creating a suggestion like, "Shall we think of something fun to do together?"
[0704] Step 6:
[0705] The server sends the generated response to the terminal.
[0706] Step 7:
[0707] The terminal presents the user with the response received from the server. The response is either read aloud or displayed on the screen.
[0708] Step 8:
[0709] The device uses an emotion engine to continuously record emotional states for future user interactions, and prepares to notify pre-designated family members or relevant parties if abnormalities persist.
[0710] This allows the system to tailor its interactions to incorporate user emotions, providing more personalized support.
[0711] (Example 2)
[0712] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0713] In everyday conversations with the elderly, there is a challenge in improving the quality of communication by providing personalized responses based on emotions. Furthermore, conventional technologies have had difficulty accurately identifying the emotional state of the user and appropriately adapting the conversation.
[0714] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0715] In this invention, the server includes means for receiving natural language voice or text input from a user via a communication system, means for analyzing the input and identifying the user's intentions and emotions using natural language processing technology, and means for generating emotion-based personalized responses using a generative AI model. This enables natural communication with elderly people.
[0716] A "communication system" is a general term for devices and technologies that use networks to send and receive voice and text data from users.
[0717] "Natural language processing technology" refers to technologies that enable computers to understand and analyze human language, including methods for analyzing the intent and emotions behind language.
[0718] "Intention" refers to the purpose or will of what a user is trying to convey through their statements or input.
[0719] "Emotions" refer to the mental state that users express through their words and input, and include psychological reactions such as joy, sadness, and surprise.
[0720] A "generative AI model" is an artificial intelligence technology that uses machine learning algorithms to generate natural-sounding words and sentences from data.
[0721] "Personalized responses" refer to conversations and actions that are individually tailored based on the user's specific intentions and feelings.
[0722] "Location information" refers to data that indicates a user's physical geographical location and is used in maps, navigation, and other applications.
[0723] "Dynamic adaptation" refers to a system automatically adjusting its responses and actions in response to changes in the user's state or environment.
[0724] This invention relates to a communication system incorporating an emotion engine, which aims to support the daily conversations of elderly people and provide personalized responses based on their emotions. This system includes three key elements: a server, a terminal, and a user.
[0725] The server receives natural language-based voice or text data from the user transmitted from the terminal via the communication system. The server processes this data and uses natural language processing techniques to identify the user's intent and emotions. The natural language processing techniques used here are based on machine learning algorithms, and it is possible to use generative AI models such as BERT or the GPT series. Once the emotional state is identified, the server uses the generative AI model to generate a personalized response based on that emotion.
[0726] The terminal is a device that sends user input to a server, receives a response from the server, and presents it to the user. In this process, speech recognition and speech synthesis technologies are utilized to provide both data input and responses to the user in a natural manner. Specific hardware examples include smart devices equipped with speech recognition capabilities, and general-purpose voice playback applications can be used as software.
[0727] Users can interact with the system intuitively through their devices and receive support. This allows them to receive advice and information tailored to their emotions, enabling them to enjoy a richer experience even in everyday conversations.
[0728] For example, if a user says, "I'm happy the weather is nice today," the server analyzes this as text data and detects the emotion of "happiness." Based on this, the server generates a positive response such as, "It looks like it's going to be a great day, is there anything you'd like to do?" and the terminal presents it to the user. An example of this prompt message is "generating an appropriate response when an elderly person expresses a positive mood."
[0729] This system can improve the quality of life for the elderly and enable flexible and effective communication that responds to their emotions.
[0730] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0731] Step 1:
[0732] The user enters a message into the terminal via voice or text. The terminal uses speech recognition technology to convert the input voice into a digital format. If the input is text, it is processed as is and prepared to be sent to the server. At this stage, the input is the user's natural language message, which the terminal converts into digital data.
[0733] Step 2:
[0734] The device sends digital audio or text data received from the user to the server. After receiving this data, the server begins data analysis using natural language processing technology. The input data is text in string format, which the server tokenizes and analyzes the meaning and structure of the sentences using machine learning algorithms. As a result, the user's intent and emotions are identified.
[0735] Step 3:
[0736] Based on the analysis results, the server initiates a process to generate a response using a generative AI model. Here, the emotion engine contributes by inputting the most appropriate prompt sentence for the user's emotion into the generative AI model. For example, if the emotion "happy" is detected, a positive response will be generated. The output is a sentence that serves as the response to the user.
[0737] Step 4:
[0738] The response generated by the server is sent to the terminal and ready to be presented to the user. The terminal converts the received text data into speech output using speech synthesis technology, or displays it to the user as text. At this stage, the output is either speech or text that the user can recognize.
[0739] Step 5:
[0740] The user can continue the conversation by acknowledging the received response and making new inputs. Through this two-way interaction, the system continues the dialogue in accordance with the user's emotional state and intentions, achieving adaptive communication.
[0741] (Application Example 2)
[0742] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0743] In the daily lives of the elderly, a lack of communication and feelings of loneliness are serious problems. Conventional technologies cannot provide flexible and personalized responses based on emotions, making it difficult to provide appropriate support according to individual emotional states. As a result, there is insufficient suggestion of dialogues and activities that address the latent emotions and intentions of the elderly, hindering the improvement of their quality of life.
[0744] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0745] In this invention, the server includes means for receiving natural language voice or text input from an individual via a communication device, means for analyzing the input and identifying the individual's intentions using natural language processing technology, and means for identifying the individual's emotional state using emotion detection functionality. This makes it possible to accurately analyze an individual's emotional state and intentions and provide personalized responses and activity suggestions based on them.
[0746] A "communication device" is a device that receives natural language voice or text input from an individual and exchanges information with a server.
[0747] "Natural language processing technology" refers to techniques for analyzing individual speech and text to identify their intent and meaning, and includes machine learning algorithms.
[0748] The "emotion detection function" is a function that identifies and analyzes the emotional state contained in an individual's statements and input content.
[0749] "Past conversation history" refers to a record of conversations and interactions an individual has had in the past, and is information that is referenced when generating responses.
[0750] A "notification device" is a device that transmits information to relevant parties when abnormal behavior of an individual is detected.
[0751] A "personalized response" is a response generated based on an individual's emotional state and intentions, and is a form of dialogue with content that is individually applied.
[0752] "Activity suggestions" are proposals aimed at improving the quality of life by suggesting optimal actions and activities according to an individual's emotional state.
[0753] The system of this invention consists of a communication device used by individuals, including the elderly, an emotion analysis system via a server, and a notification device that notifies relevant parties. The main role of the system is to analyze natural language speech and text input obtained from individuals to understand their intentions and emotional state.
[0754] The server receives input via communication devices and uses natural language processing techniques to identify the individual's intentions. It utilizes machine learning algorithms to precisely analyze the input. Furthermore, it uses emotion detection to identify the individual's emotional state and generates personalized responses by comparing them with past conversation history. This entire process can utilize a common cloud-based platform.
[0755] The generated response is presented to the individual via a communication device. For example, if an elderly person inputs "I'm lonely today," the system will offer a hospitable suggestion such as, "Why don't you talk to a friend on the phone? Or would you like to watch a movie together?"
[0756] Furthermore, the system can improve an individual's quality of life by suggesting activities based on their emotional state. An example of a prompt might be: "If the user wants to relax, generate a response suggesting relaxing facilities in the area."
[0757] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0758] Step 1:
[0759] The server receives natural language speech or text input from individuals via communication devices. This input is taken into the system as raw data.
[0760] Step 2:
[0761] The server analyzes the received input using natural language processing techniques. This includes grammatical analysis and keyword extraction of the input content, as well as data processing to identify the individual's intent. The output is text data with the identified intent.
[0762] Step 3:
[0763] The server uses emotion detection functionality based on identified intent to identify the emotional state contained in the input. This process involves data calculations using an emotion model, and the emotional state of the input is output in text format.
[0764] Step 4:
[0765] The server references past conversation history and pre-configured information, and uses a generative AI model to generate personalized responses. It leverages prompts to create optimized responses. The output at this stage is individually customized response text.
[0766] Step 5:
[0767] The server sends the generated response to the individual via communication equipment. At this time, speech synthesis or display corresponding to the response is performed. The user receives the response and completes the interaction with the system.
[0768] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0769] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0770] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0771] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0772] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0773] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0774] The inside of the Emotion Map 400 represents what's in your mind, while the outside represents what you're doing. Therefore, the further you go out the 400-coordinate scale, the more visible your emotions become (the more they manifest in your actions).
[0775] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0776] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0777] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0778] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0779] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0780] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0781] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0782] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0783] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0784] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0785] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0786] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0787] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0788] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0789] The following is further disclosed regarding the embodiments described above.
[0790] (Claim 1)
[0791] A means for receiving natural language speech or text input from a user via a communication device,
[0792] A means for analyzing the input and identifying the user's intent using natural language processing technology,
[0793] A means for generating a response by referring to past conversation history and pre-configured information,
[0794] Means for presenting the generated response to the user via the communication device,
[0795] A means of detecting abnormal user behavior and notifying relevant parties through a notification device,
[0796] A system that includes this.
[0797] (Claim 2)
[0798] The system according to claim 1, wherein the natural language processing technology analyzes the user's intent using a machine learning algorithm.
[0799] (Claim 3)
[0800] The system according to claim 1, wherein the communication device acquires the user's location information and generates an appropriate response based on said information.
[0801] "Example 1"
[0802] (Claim 1)
[0803] A means of receiving natural language speech or text input from a user via a communication device,
[0804] A means for analyzing the input and identifying the user's intent using natural language processing technology,
[0805] A means for generating a response by referring to past dialogue history and pre-configured data,
[0806] A means for presenting the generated response to the user via the communication device,
[0807] A means for detecting abnormal user behavior and notifying a third party through a notification device,
[0808] A response optimization means using a generative model,
[0809] A means for transmitting warning information to a third party based on the aforementioned abnormal behavior,
[0810] A system that includes this.
[0811] (Claim 2)
[0812] The system according to claim 1, wherein the natural language processing technology analyzes the user's intent using a learning algorithm.
[0813] (Claim 3)
[0814] The system according to claim 1, wherein the communication device acquires the user's location data and generates an appropriate response based on the data.
[0815] "Application Example 1"
[0816] (Claim 1)
[0817] A means for receiving natural language speech or text input from a user via a communication device,
[0818] A means for analyzing the input and identifying the user's intent using natural language processing technology,
[0819] A means for generating a response by referring to past conversation history and pre-configured information,
[0820] Means for presenting the generated response to the user via the communication device,
[0821] A means of detecting abnormal user behavior and notifying relevant parties through a notification device,
[0822] A means of checking health status and setting schedule reminders using a voice recognition device,
[0823] A means of monitoring user safety using location information devices and notifying of abnormalities,
[0824] A system that includes this.
[0825] (Claim 2)
[0826] The system according to claim 1, wherein the natural language processing technology analyzes the user's intent using a machine learning algorithm and improves the quality of the response using a generative AI model.
[0827] (Claim 3)
[0828] The system according to claim 1, wherein the communication device acquires the user's location information, generates an appropriate response based on the information, detects an anomaly in location and notifies the relevant parties.
[0829] "Example 2 of combining an emotion engine"
[0830] (Claim 1)
[0831] A means of receiving natural language speech or text input from a user via a communication system,
[0832] A means for analyzing the input and identifying the user's intent and emotions using natural language processing technology,
[0833] A means of generating emotion-based personalized responses using a generative AI model,
[0834] A means for presenting the generated response to the user via the communication system,
[0835] A means of dynamically adapting the flow of conversation based on the user's emotional state,
[0836] A system that includes this.
[0837] (Claim 2)
[0838] The system according to claim 1, wherein the natural language processing technology uses a machine learning algorithm to analyze the user's intentions and emotions.
[0839] (Claim 3)
[0840] The system according to claim 1, wherein the communication system acquires the user's location information and provides a response appropriate to the user's emotions based on that information.
[0841] "Application example 2 when combining with an emotional engine"
[0842] (Claim 1)
[0843] A means of receiving natural language speech or text input from an individual via a communication device,
[0844] A means for analyzing the input and identifying an individual's intentions using natural language processing technology,
[0845] A means of identifying an individual's emotional state using an emotion detection function,
[0846] A means for generating a response by referring to past conversation history and pre-configured information,
[0847] A means for presenting the generated response to the individual via the communication device,
[0848] A means of detecting abnormal behavior of an individual and notifying relevant parties through a notification device,
[0849] A means of proposing dialogues and activities that are tailored to an individual's emotional state,
[0850] A system that includes this.
[0851] (Claim 2)
[0852] The system according to claim 1, wherein the natural language processing technology uses a machine learning algorithm to analyze an individual's intentions.
[0853] (Claim 3)
[0854] The system according to claim 1, wherein the communication device acquires an individual's location information and generates an appropriate response based on the information and emotional state. [Explanation of symbols]
[0855] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A means for receiving natural language speech or text input from a user via a communication device, A means for analyzing the input and identifying the user's intent using natural language processing technology, A means for generating a response by referring to past conversation history and pre-configured information, Means for presenting the generated response to the user via the communication device, A means of detecting abnormal user behavior and notifying relevant parties through a notification device, A means of checking health status and setting schedule reminders using a voice recognition device, A means of monitoring user safety using location information devices and notifying of abnormalities, A system that includes this.
2. The system according to claim 1, wherein the natural language processing technology analyzes the user's intent using a machine learning algorithm and improves the quality of the response using a generative AI model.
3. The system according to claim 1, wherein the communication device acquires the user's location information, generates an appropriate response based on the information, detects an anomaly in location and notifies the relevant parties.