system
A system using an information processing device and display device to extract age-specific information and adjust conversation flow addresses the challenge of communicating with dementia patients, reducing caregiver burden and stabilizing patient emotions.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-26
AI Technical Summary
There is a challenge in effectively communicating with dementia patients, particularly in reducing the burden on caregivers and providing a method for dementia patients to enjoyably and effectively evoke memories, while addressing the shortage of personnel in care.
A system combining an information processing device and a display device to support communication, which extracts age-specific information based on user history, presents it to promote natural conversation, and adjusts the conversation flow based on user interests and understanding, using response processing to generate the next question.
Enables meaningful communication for dementia patients, reducing caregiver burden and stabilizing patient emotions by promoting reflection on past experiences.
Smart Images

Figure 2026105394000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including: receiving a user utterance; adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot; encoding the prompt; and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In modern society, while appropriate care for dementia patients is required, there are problems such as an increasing burden on caregivers and a serious shortage of personnel. In particular, it is difficult to smoothly communicate with dementia patients, and it is necessary to reduce the mental burden on caregivers. Furthermore, there is a need to provide a method that allows dementia patients themselves to enjoyably and effectively evoke memories.
Means for Solving the Problems
[0005] This invention provides a system combining an information processing device and a display device to support communication, which is considered crucial in the care of dementia patients, and to reduce the burden on caregivers. Specifically, it extracts age-specific information based on the user's history and presents this information to the user to promote natural conversation. Furthermore, a response processing device analyzes the user's response and generates the next question, thereby adjusting the conversation to match the user's interests and understanding. In this way, the present invention aims to create a communication environment that dementia patients can use with peace of mind and to reduce the burden on caregivers.
[0006] An "information processing device" is a device that extracts age-based information based on a user's history and exchanges information with other devices.
[0007] "History information" refers to a collection of data related to a user's past experiences and activities.
[0008] "Information by time period" refers to information related to a specific year, month, or historical context.
[0009] A "display device" is a device used to visually present information to users.
[0010] A "response processing device" is a device that analyzes responses from users and generates new information or questions based on those analyses.
[0011] "Question generation based on response content" is the process of analyzing user reactions and creating related subsequent questions.
[0012] "Conversation adjustment" refers to the process of optimizing the flow and content of a conversation according to the user's interests and level of understanding.
[0013] "Storage" refers to saving data and information so that it can be referenced or used later.
[0014] "Learning trends" refers to the process of extracting certain patterns and features from recorded data and using them to make future predictions and responses.
[0015] "Voice input and text input" refers to the method by which users provide information either through voice or in text.
Brief Description of Drawings
[0016] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [[ID=2)]] [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [[ID=2))]] [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which multiple emotions are mapped. [Figure 10] It shows an emotion map to which multiple emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13]It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when the emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when the emotion engine is combined.
Mode for Carrying Out the Invention
[0017] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0018] First, the terms used in the following description will be explained.
[0019] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0020] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0021] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0022] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0023] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0024] [First Embodiment]
[0025] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0026] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0027] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0028] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0029] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0030] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0031] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0032] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0033] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0034] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0035] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0036] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0037] This invention is an information processing system that enables effective communication with dementia patients. The specific operation of each component is described below.
[0038] First, the server is responsible for managing user history information and extracting age-specific information based on this data. The server refers to databases containing user registration information, past activity history, and preferences to identify information, images, and videos related to the user's time period.
[0039] The terminal functions as an interface for providing users with information received from the server. The terminal visually displays chronological information provided by the server and can provide related questions and explanations verbally using speech synthesis technology. This helps users recall past events by viewing and listening to the information.
[0040] The user responds to the presented information and questions using voice or text. The terminal processes these responses using speech recognition and text analysis technologies, and analyzes the content of the responses. Based on the analyzed responses, the server generates the next conversation sequence. For example, if the user responds, "This photo brings back memories. It reminds me of a trip I took when I was younger," the server creates an additional question such as, "Could you tell me more about that trip?" and presents it to the user through the terminal.
[0041] As a concrete example, consider a scenario where a user begins to talk about childhood memories. The server provides the terminal with information related to schools and games in the 1950s, and the terminal presents this information to the user and asks, "What was school life like back then?" Upon receiving the user's response, the terminal sends that information back to the server and generates further information and questions based on the user's response. In this way, the present invention realizes two-way communication in which users can actively participate.
[0042] This embodiment allows dementia patients to safely and comfortably reflect on their past, reducing the burden on caregivers while stabilizing the patient's emotions.
[0043] The following describes the processing flow.
[0044] Step 1:
[0045] The server searches the database based on the user's profile information and extracts relevant age-specific information. This includes the user's date of birth, work history, and favorite music.
[0046] Step 2:
[0047] The server packages the extracted age-based information and sends it to the terminal. The transmitted information includes images, videos, audio files, and related text data.
[0048] Step 3:
[0049] The terminal displays information received from the server on the user's screen. The display method is adjusted so that visual content is presented in an appropriate format.
[0050] Step 4:
[0051] The device uses speech synthesis technology based on received data to present the user with relevant questions in voice. For example, a question like, "Do you remember anything about this era?"
[0052] Step 5:
[0053] The user responds to the device via voice or text. The user can also speak about their memories and impressions of the information presented.
[0054] Step 6:
[0055] The device converts the user's voice responses into text using recognition technology and analyzes the content. Important keywords and themes are then extracted from the analysis results.
[0056] Step 7:
[0057] The server generates the next information and questions based on the analysis results sent from the terminal. The server then appropriately adjusts the received data based on the user's interests and responses.
[0058] Step 8:
[0059] The terminal presents new information and questions from the server, continuing the conversation. The terminal selects topics likely to interest the user, deepening the conversation and further retrieving the user's memory.
[0060] In this way, the system enables meaningful communication for dementia patients through interaction with the user.
[0061] (Example 1)
[0062] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0063] Communicating with dementia patients in modern society presents numerous challenges. In particular, there is a need for effective methods to elicit past memories and stabilize patients' emotions. Furthermore, it is necessary to provide patients with appropriate information while reducing the burden on caregivers.
[0064] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0065] In this invention, the server includes means for extracting age-specific information based on the user's history information, means for transmitting age-specific information to a communication device, and means for analyzing the user's response and generating the next conversation sequence using a generative AI model. This enables dementia patients to reflect on their past while engaging in interactive communication tailored to their needs.
[0066] An "information processing device" is a computer that extracts age-related information relevant to a user based on historical data and performs necessary processing.
[0067] "Information by era" refers to information such as events, photographs, and videos related to a specific period, and is data used to evoke memories in the user.
[0068] A "communication device" is a device that plays the role of sending and receiving information between an information processing device and a display device.
[0069] "Display means" refers to a device that visually presents received information to the user, and includes displays and the like.
[0070] A "sound output means" is a device that provides information in sound using speech synthesis technology.
[0071] A "response processing device" is a device that analyzes the user's voice or text responses and generates the next question based on that content.
[0072] "Speech recognition means" refers to a technology or device for converting speech input into text data.
[0073] "Text analysis technology" refers to the technology used to analyze text data and understand its meaning.
[0074] A "generative AI model" is an algorithm or technology used to generate the next conversation sequence or question based on the response content.
[0075] A "conversation sequence" is a series of questions and answers designed to facilitate smooth interaction with the user.
[0076] This invention is an information processing system that enables effective communication with dementia patients, and is realized through the server, terminal, and user each fulfilling their respective roles.
[0077] First, the server manages the user's history information. Specifically, the server uses a database management system to access a database containing the user's registration information, past activity history, and preferences. This allows the server to extract age-specific information related to the user, for example, identifying photos and videos from a specific period. This information is then transmitted to the terminal via a communication device for further processing.
[0078] The terminal presents the user with age-based information received from the server. The terminal is equipped with visual and audio output devices, which are used to provide information visually and audibly. The visual device uses a display to show images and videos in high resolution. The audio output device utilizes speech synthesis technology to provide relevant questions and explanations in natural-sounding voice. This allows the user to receive information visually and audibly, helping them recall past events.
[0079] The user can respond to information presented by the device using voice or text. The device analyzes this response using speech recognition and text analysis technology and sends the results to the server. The server uses a generative AI model to generate the next conversation sequence based on the user's response. For example, if the user says, "This photo brings back memories," the server generates a question such as, "Could you tell me more about your memories of that time?" and presents it through the device.
[0080] A concrete example would be a scenario where a user seeks information related to their childhood memories. The server provides information about school life in the 1950s, and the terminal can then use that information to ask, "What was school life like back then?" An example of a prompt to input into the generating AI model would be, "Generate questions related to school life in the 1950s. Please consider content that will elicit the user's memories and specific experiences."
[0081] In this way, servers, terminals, and users work together to enable two-way communication with dementia patients, thereby reducing the burden on caregivers.
[0082] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0083] Step 1:
[0084] The server retrieves user history information from the database. User IDs and registration information are used as input, and the output is historical data related to the user. Specifically, the server executes SQL queries to extract the necessary data and organizes it by year.
[0085] Step 2:
[0086] The server identifies chronological information based on the acquired historical data. The historical data obtained in step 1 is used as input, and chronological information such as photos and videos is generated as output. The server applies an information extraction algorithm to select the most relevant information.
[0087] Step 3:
[0088] The server sends the identified age-specific information to the terminal. The input is the age-specific information generated in step 2, and the output is the information converted into a format that can be received by the terminal. The server transmits the information using a communication protocol.
[0089] Step 4:
[0090] The terminal presents age-based information received from the server to the user via a display device and an audio output device. The input is the information transmitted in step 3, and the output is the visual and auditory information presented to the user. The terminal displays the information on a high-resolution display and provides questions and explanations in voice using a speech synthesis engine.
[0091] Step 5:
[0092] The user responds to the presented information using voice or text. The user's voice or text data is sent to the device as input. Specifically, the user provides feedback through voice commands or messages.
[0093] Step 6:
[0094] The device analyzes the user's response using speech recognition and text analysis technologies. The input is the user's response in step 5, and the output is the analyzed response. The device converts the speech to text using speech recognition software and analyzes its meaning using a natural language processing algorithm.
[0095] Step 7:
[0096] The server uses a generative AI model based on the analyzed response to generate the next conversation sequence. The input is the analyzed data obtained in step 6, and the output is the conversation or questions presented next. The server passes a prompt to the generative AI model, which generates a new question.
[0097] Step 8:
[0098] The server sends the generated question to the terminal, which then presents it to the user. The input is the question generated in step 7, and the output is the new information shown to the user. The terminal then presents the information to the user again through the display device and sound output device.
[0099] (Application Example 1)
[0100] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0101] In communicating with dementia patients, it is necessary to support them in reflecting on the past, stabilize their emotions through conversation, and provide effective means to promote memory recall. Furthermore, a challenge is to adjust the dialogue based on the patient's interests and understanding to achieve communication that is appropriate for them.
[0102] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0103] In this invention, the server includes an information processing device that extracts age-specific information based on the user's history information, a means for transmitting the age-specific information to a display means, a display means for presenting the received age-specific information to the user, a response processing device that analyzes the user's response and generates the next question based on the response content, a means for conducting a dialogue based on the presented information using speech synthesis technology, a means for analyzing the user's voice input using speech recognition technology, and a means for creating prompt sentences using a generation AI model. This makes it possible to realize a system that allows dementia patients to reflect on their past, achieve emotional stability, and enable communication tailored to each individual patient.
[0104] An "information processing device" is a device that extracts age-based information from a user's history and transmits that information to other devices.
[0105] "Information by era" refers to a collection of information related to a specific period, extracted based on the user's history.
[0106] A "display means" is a device for presenting age-based information received from an information processing device to the user.
[0107] A "response processing device" is a device that analyzes responses from users and generates the next question based on the content of those responses.
[0108] "Speech synthesis technology" is a technology that enables conversations based on presented information to be conducted using natural-sounding speech.
[0109] "Speech recognition technology" is a technology that converts voice input from users into text data and analyzes its content.
[0110] A "generative AI model" is a form of artificial intelligence used to create prompt statements in response processing.
[0111] A "prompt sentence" is a sentence created using a generative AI model to guide the user into the next conversation.
[0112] At the heart of the system implementing this invention is an information processing device. This device is responsible for extracting age-specific information based on the user's history and transmitting the relevant information to a terminal. The terminal is equipped with a display device and a speech synthesis device, which presents the age-specific information to the user visually and audibly.
[0113] The server is a computer that manages user responses and analyzes their content. This computer uses speech recognition technology to convert the user's voice input into text and then analyzes that text. It then uses a generative AI model to create prompt sentences and determine the next information or question to present.
[0114] For example, if a user says, "I miss school life in the 1950s," the server sends school-related photos from that era to the user's device and uses speech synthesis technology to generate and present a voice message saying, "Could you tell me a little more about school life back then?"
[0115] The hardware used includes server computers and smartphones or tablets capable of display and speech synthesis. The software utilizes Google® Speech Recognition for speech recognition, Google Text-to-Speech for speech synthesis, and SQLite for data management. Software for generating AI models also runs on the server.
[0116] As a concrete example, here is an example of a prompt input to a generative AI model: "The user feels nostalgic for a park from the 1950s. Please generate the following related question." This prompt allows the generative AI model to generate subsequent questions and information for the user, adjusting the conversation to flow smoothly.
[0117] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0118] Step 1:
[0119] The server retrieves user history information from the database. The user's ID or profile information is used as input. Relevant age-specific information is extracted through database queries, and a list of the extracted information is generated as output.
[0120] Step 2:
[0121] The server sends the extracted age-based information to the terminal. The input here is the age-based information obtained in step 1, and the output is the transfer of the data to the terminal. Once the terminal receives the information, it prepares it for visual display.
[0122] Step 3:
[0123] The device presents age-specific information to the user visually and audibly. The input is age-specific information from a server. The device displays the information on the screen and uses Google Text-to-Speech to provide related questions audibly. The output is the user visually confirming the information and listening to the questions.
[0124] Step 4:
[0125] The user sees or hears the presented information and responds in voice or text. The input is a response based on the user's own memory and experience. The output is a response provided to the device in the form of voice or text.
[0126] Step 5:
[0127] The device converts user responses into text using speech recognition technology. The input is the user's voice, and the data is converted to text using Google Speech Recognition. The output is parseable text.
[0128] Step 6:
[0129] The server analyzes the transcribed response and constructs a prompt using a generative AI model. The input here is the text obtained from step 5. Data analysis determines the user's interests and topics, and then creates a prompt. The output is a prompt containing the next question or information to be presented.
[0130] Step 7:
[0131] The server sends the next question to the terminal based on the generated prompt. The input is the prompt constructed in step 6. The terminal presents this information to the user again and continues the conversation. The output is the provision of new dialogue content.
[0132] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0133] This invention is an information processing system that facilitates communication with dementia patients and enables interaction that takes into account the user's emotions. This system consists of an information processing device, a display device, a response processing device, and an emotion engine. The specific functions and operations of each component are described below.
[0134] First, the server uses the user's history information to extract age-based data. This data includes images and videos based on the user's past experiences and interests, as well as related historical data. The server then sends the extracted information to the terminal.
[0135] The device presents the user with age-specific information received from the server. Specifically, it displays images and videos while simultaneously reading the relevant information aloud using speech synthesis technology. At this stage, the device works in conjunction with an emotion engine to collect the user's facial expressions and voice tone through the camera and microphone.
[0136] The user responds to the information presented by the device. This response is sent to the device as text or voice input. The device receives this and performs analysis in the response processing unit. In the analysis process, emphasis is placed not only on the user's response but also on the collected sentiment data.
[0137] The emotion engine determines the user's emotional state based on user input and collected emotional data. For example, if the user makes a nostalgic expression, the emotion engine will determine this to be a positive emotion. The server receives feedback from the emotion engine and adjusts the next questions and information presentation accordingly.
[0138] As a concrete example, consider a scenario where the user feels nostalgic about school life in the 1950s, as presented. If the user shows a satisfied expression, the emotion engine recognizes this as a positive response. Based on this, the server assumes the user wants to continue the conversation and generates a question such as, "Please tell me about your memories with friends from that time." The terminal then presents this question to the user, allowing for a smooth continuation of the dialogue.
[0139] In this way, the present invention realizes interaction that reflects the user's emotions in real time, providing more personalized care. Through this system, caregivers can maintain the mental health of dementia patients and reduce the burden of daily care.
[0140] The following describes the processing flow.
[0141] Step 1:
[0142] The server extracts relevant, age-specific information from the database based on the user's history. This information includes images, videos, and audio files that are likely to be of interest to the user.
[0143] Step 2:
[0144] The server packages the extracted age-based information and related supplementary data and sends it to the terminal. The terminal then prepares to effectively present the received information to the user.
[0145] Step 3:
[0146] The terminal displays information received from the server to the user and simultaneously provides information via audio. It is optimized so that the user can receive this information both visually and aurally.
[0147] Step 4:
[0148] The device activates an emotion engine and acquires data from the camera and microphone to monitor the user's facial expressions and tone of voice. This data is used to recognize the user's emotional state in real time.
[0149] Step 5:
[0150] The user reacts to the presented content and provides verbal responses. The user's responses are sent to the device via voice input.
[0151] Step 6:
[0152] The terminal converts the user's voice into text, which is then analyzed by a response processing unit. Simultaneously, the emotion engine evaluates the user's emotions and determines whether they are positive, negative, or neutral.
[0153] Step 7:
[0154] The server receives the analysis results and sentiment evaluation, and generates information and questions for the next step. The generated content is optimized for the user's interests and emotional state.
[0155] Step 8:
[0156] The terminal presents the user with new information and questions obtained from the server, continuing the conversation. The terminal strives to help the user relax and continue talking by providing information at the appropriate time and in the right tone.
[0157] Through the above process, the system makes interactions with users more natural and emotionally sensitive, providing an effective and comfortable communication environment.
[0158] (Example 2)
[0159] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0160] A challenge in interacting with dementia patients is the lack of interactive information provision that takes into account the user's emotions. Conventional systems have difficulty adjusting conversations to reflect the user's emotional state in real time, resulting in limited provision of individualized care.
[0161] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0162] In this invention, the server includes means for an information processing device to extract age-specific data based on the user's history data, means for a terminal to collect the user's facial expressions and voice tone through a camera and microphone, and means for an emotion engine to determine the user's emotional state and adjust the conversation accordingly. This enables smooth dialogue that takes the user's emotional state into consideration in real time.
[0163] An "information processing device" is a device that analyzes data and extracts and generates necessary information based on a specific purpose.
[0164] "Age-based data" refers to data that includes images, videos, and historical information related to a specific period in the past, based on the user's history and interests.
[0165] A "display device" is a device that presents received information to a user visually or audibly.
[0166] "Speech synthesis technology" is a technology that converts text data into speech and reads it aloud.
[0167] A "terminal" is a device that collects user input and emotional states and presents that information.
[0168] A "camera" is an optical device that captures images or videos and transmits them to a display or other device.
[0169] A "microphone" is a device that receives sound and converts it into an electrical signal.
[0170] An "emotion engine" refers to technology that analyzes a user's facial expressions and voice tone to determine their emotional state.
[0171] A "response processing device" is a device used to analyze user responses and generate the content of the next dialogue.
[0172] A "generative AI model" is an artificial intelligence technology that generates new text or questions based on input prompts.
[0173] This invention is an information processing system that facilitates communication with dementia patients and enables interaction that takes into account the user's emotions. The system mainly consists of a server, a terminal, a display device, a response processing device, and an emotion engine.
[0174] The server uses a database to analyze user history data and extract data by age group. This information includes images, videos, and historical data related to the user's past experiences and interests. The server uses a database management system.
[0175] When the terminal receives age-specific data from the server, it visually presents images and videos using a display device and reads out related information using speech synthesis technology. A general-purpose speech synthesis engine is used for this process. The terminal also records the user's facial expressions and voice tone using a camera and microphone, and an emotion engine analyzes this data.
[0176] The user can respond to the presented information, and this response is sent to the terminal as text or voice input. The terminal immediately sends this input data to the response processing unit.
[0177] The emotion engine analyzes the user's facial expressions and tone of voice to evaluate their emotional state. If the user smiles or displays a satisfied expression, it is recognized as a positive emotion. Based on this evaluation, the server uses a generative AI model to generate the next dialogue, taking the user's emotional state into consideration.
[0178] As a concrete example, consider a case where a user makes a nostalgic expression when presented with data about school life in the 1950s. The emotion engine analyzes this as a positive reaction and provides the result to the server. The server uses a generative AI model and prompt text to generate the next question, such as "Please tell us about your memories with friends from that time," and presents it to the user through the terminal.
[0179] An example of a prompt message is: "Generate a new question based on the user's interests."
[0180] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0181] Step 1:
[0182] The server retrieves user history data from the database. The input is the user's ID, and the output is extracted historical data related to past experiences and interests. Database management software is used to process the historical data and sort it into age-specific data. Specifically, SQL queries are used to extract data categorized by age.
[0183] Step 2:
[0184] The server sends the extracted age-based data to the terminal. The input is the extracted age-based data, and the output is the transmission of data to the terminal. Network communication protocols are used to accurately transmit the data to the terminal.
[0185] Step 3:
[0186] The terminal analyzes the received age-specific data, displays images and videos on a display device, and reads the information aloud using speech synthesis technology. The input is age-specific data received from the server, and the output is a visual and auditory presentation to the user. A speech synthesis engine is used to convert text into speech, which is then presented using the display and speaker.
[0187] Step 4:
[0188] The user reacts to the information presented. Input is the information presented by the device, and output is the information sent to the device as the user's facial expressions, voice responses, or text input.
[0189] Step 5:
[0190] The device collects user responses through its camera and microphone. Input is the user's facial expressions and voice, and output is the transmission of this data to a response processing unit. Image analysis and speech recognition technologies are used to structure the collected data and send it to the emotion engine.
[0191] Step 6:
[0192] The emotion engine analyzes the user's facial expressions and voice tone to determine their emotional state. The input is the user's emotional data, and the output is an evaluation of that emotional state. An analysis algorithm is used to classify emotions as positive, negative, or neutral.
[0193] Step 7:
[0194] The server uses a generative AI model to generate the next dialogue based on feedback from the emotion engine. The input consists of the emotion engine's evaluation results, the current conversation context, and prompts for the generative AI model; the output is the next question or information to be presented. The generative AI model uses natural language generation techniques to create the new dialogue.
[0195] Step 8:
[0196] The terminal presents the user with new dialogue content from the server. The input is the next question or information generated by the server, and the output is the visual or auditory presentation of this information to the user. This iterative process ensures a smooth and continuous dialogue with the user.
[0197] (Application Example 2)
[0198] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0199] In interacting with dementia patients, the challenge lies in achieving smooth interactions that reflect the user's emotions in real time, and providing individualized care based on the user's past experiences and interests. Furthermore, there is a need to build a system that reduces the burden on caregivers and makes daily care more effective.
[0200] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0201] In this invention, the server includes an information processing device that extracts age-specific information based on the user's history information, a means for transmitting the age-specific information to a display device, and a means for the display device to present the received age-specific information to the user. This makes it possible to provide personalized care through interaction that takes into account the user's emotions, thereby reducing the burden on caregivers.
[0202] An "information processing device" is an electronic device that extracts age-based information based on the user's history and transmits it to a display device.
[0203] "Information by age group" refers to data organized by age group based on users' past experiences and interests.
[0204] A "display device" is a device that visually presents age-based information received from an information processing device to the user.
[0205] A "response processing device" is a device that analyzes responses from users, generates the next question, and adjusts the conversation accordingly.
[0206] An "emotion engine" is software or hardware that collects a user's facial expressions or voice tone and determines their emotional state.
[0207] "Emotional feedback" refers to information used to reflect the results of the emotional engine's assessment of the emotional state in the interaction.
[0208] "Means of adjusting interaction" refer to a system that appropriately modifies conversations and information presentations based on emotional feedback from an emotion engine.
[0209] A "smart device" is a portable electronic device used to improve visual or auditory interaction, and it has the ability to run applications.
[0210] This invention is a system for facilitating smooth communication with dementia patients. The system primarily consists of a server, a terminal, and user interaction. The server is responsible for extracting age-specific information based on the user's history and transmitting it to the terminal. Age-specific information is managed in the form of images, videos, and related historical data.
[0211] The terminal presents age-specific information received from the server to the user via a display device. Specifically, it displays images and videos on the screen while simultaneously using speech synthesis technology to read the information aloud and provide it to the user as audio information. The terminal also works in conjunction with an emotion engine to collect the user's facial expressions and voice tone through the camera and microphone. This information is used to understand the user's emotional state. Based on the collected data, the emotion engine analyzes emotions in real time and determines positive and negative responses.
[0212] The user responds to the presented information via voice, and this response is analyzed on the device. Based on the analysis results and emotional feedback from the emotion engine, the server adjusts the next questions and information presentation. For example, if the user smiles, the server can generate questions that evoke positive memories, such as, "Do you remember the first place you traveled to?"
[0213] In implementing the system of this invention, hardware such as smart glasses (e.g., a general portable visual device) or a tablet (e.g., a general personal information terminal) can be used. Software such as a face capture library (e.g., OpenCV), a speech recognition API (e.g., Google Cloud Speech-to-Text), and an emotion analysis engine (e.g., Affectiva) can be applied. With such a system configuration, it becomes possible to maintain the mental health of dementia patients through individualized care while reducing the burden on caregivers.
[0214] As an example of a prompt message, it can be used in server design in the form of, "When the user smiles, ask them verbally, 'Do you remember the first place you traveled to?'"
[0215] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0216] Step 1:
[0217] The server extracts age-specific information based on the user's history. The input is the user's history data, which is used to search for age-specific information based on past experiences and interests, and select relevant information from the database. The output is age-specific information data.
[0218] Step 2:
[0219] The server sends the extracted age-based information to the terminal. The input is the age-based information data generated in step 1, which is sent to the terminal via the network protocol. The output is the age-based information data received by the terminal.
[0220] Step 3:
[0221] The terminal presents age-based information received to the user via a display device. The input is age-based information data transmitted from the server, which is converted into image or video formats for display on the screen. Furthermore, a speech synthesis API is used to convert the visual information into speech. The output is audiovisual information for the user to view.
[0222] Step 4:
[0223] The device uses a camera and microphone to collect the user's facial expressions and voice tone in real time. The input is the user's audiovisual reactions, which are acquired as camera video and microphone audio data. The output is emotion data that is processed in real time.
[0224] Step 5:
[0225] The emotion engine determines the user's emotional state based on collected facial expressions and voice tone. The input is emotional data supplied from the device, which is analyzed using a machine learning algorithm. The output is the user's emotional state as determined by the emotion engine.
[0226] Step 6:
[0227] The response processing unit analyzes the user's response and generates the next question, taking into account the emotional state from the emotion engine. The input consists of response data and emotional state information provided by the user, either as speech or text, which are analyzed using natural language processing and an AI model. The output is the data for the next question presented.
[0228] Step 7:
[0229] The server sends the generated question data to the terminal and adjusts the interaction. The input is the question data generated in step 6, which is sent via network communication to the terminal. The output is a new question presented to the user.
[0230] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0231] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0232] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0233] [Second Embodiment]
[0234] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0235] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0236] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0237] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0238] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0239] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0240] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0241] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0242] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0243] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0244] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0245] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0246] This invention is an information processing system that enables effective communication with dementia patients. The specific operation of each component is described below.
[0247] First, the server is responsible for managing user history information and extracting age-specific information based on this data. The server refers to databases containing user registration information, past activity history, and preferences to identify information, images, and videos related to the user's time period.
[0248] The terminal functions as an interface for providing users with information received from the server. The terminal visually displays chronological information provided by the server and can provide related questions and explanations verbally using speech synthesis technology. This helps users recall past events by viewing and listening to the information.
[0249] The user responds to the presented information and questions using voice or text. The terminal processes these responses using speech recognition and text analysis technologies, and analyzes the content of the responses. Based on the analyzed responses, the server generates the next conversation sequence. For example, if the user responds, "This photo brings back memories. It reminds me of a trip I took when I was younger," the server creates an additional question such as, "Could you tell me more about that trip?" and presents it to the user through the terminal.
[0250] As a concrete example, consider a scenario where a user begins to talk about childhood memories. The server provides the terminal with information related to schools and games in the 1950s, and the terminal presents this information to the user and asks, "What was school life like back then?" Upon receiving the user's response, the terminal sends that information back to the server and generates further information and questions based on the user's response. In this way, the present invention realizes two-way communication in which users can actively participate.
[0251] This embodiment allows dementia patients to safely and comfortably reflect on their past, reducing the burden on caregivers while stabilizing the patient's emotions.
[0252] The following describes the processing flow.
[0253] Step 1:
[0254] The server searches the database based on the user's profile information and extracts relevant age-specific information. This includes the user's date of birth, work history, and favorite music.
[0255] Step 2:
[0256] The server packages the extracted age-based information and sends it to the terminal. The transmitted information includes images, videos, audio files, and related text data.
[0257] Step 3:
[0258] The terminal displays information received from the server on the user's screen. The display method is adjusted so that visual content is presented in an appropriate format.
[0259] Step 4:
[0260] The device uses speech synthesis technology based on received data to present the user with relevant questions in voice. For example, a question like, "Do you remember anything about this era?"
[0261] Step 5:
[0262] The user responds to the device via voice or text. The user can also speak about their memories and impressions of the information presented.
[0263] Step 6:
[0264] The device converts the user's voice responses into text using recognition technology and analyzes the content. Important keywords and themes are then extracted from the analysis results.
[0265] Step 7:
[0266] The server generates the next information and questions based on the analysis results sent from the terminal. The server then appropriately adjusts the received data based on the user's interests and responses.
[0267] Step 8:
[0268] The terminal presents new information and questions from the server, continuing the conversation. The terminal selects topics likely to interest the user, deepening the conversation and further retrieving the user's memory.
[0269] In this way, the system enables meaningful communication for dementia patients through interaction with the user.
[0270] (Example 1)
[0271] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0272] Communicating with dementia patients in modern society presents numerous challenges. In particular, there is a need for effective methods to elicit past memories and stabilize patients' emotions. Furthermore, it is necessary to provide patients with appropriate information while reducing the burden on caregivers.
[0273] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0274] In this invention, the server includes means for extracting age-specific information based on the user's history information, means for transmitting age-specific information to a communication device, and means for analyzing the user's response and generating the next conversation sequence using a generative AI model. This enables dementia patients to reflect on their past while engaging in interactive communication tailored to their needs.
[0275] An "information processing device" is a computer that extracts age-related information relevant to a user based on historical data and performs necessary processing.
[0276] "Information by era" refers to information such as events, photographs, and videos related to a specific period, and is data used to evoke memories in the user.
[0277] A "communication device" is a device that plays a role in transmitting and receiving information between an information processing device and a display means.
[0278] A "display means" is a device that visually presents the received information to the user and includes a display and the like. [[ID=##]]
[0279] A "sound output means" is a device for providing information in voice using voice synthesis technology. [[ID=##]]
[0280] A "response processing device" is a device that analyzes responses in the form of voice or text from the user and generates the next question based on the content.
[0281] A "voice recognition means" is a technology or device for converting voice input into text data.
[0282] A "text analysis technology" is a technology for analyzing text data and understanding its meaning.
[0283] A "generative AI model" is an algorithm or technology used to generate the next conversation sequence or question based on the response content.
[0284] A "conversation sequence" is a series of questions and responses configured to smoothly advance the interaction with the user.
[0285] This invention is an information processing system that enables effective communication with dementia patients and is realized by the server, terminal, and user each playing their respective roles.
[0286] First, the server manages the user's history information. Specifically, the server refers to a database including the user's registration information, past activity history, and preferences using a database management system. Thereby, age-related information related to the user is extracted, for example, photos and videos from a specific era are identified. This information is transmitted to the terminal through the communication device for further processing. <##000906>
[0287] The terminal presents the user with age-based information received from the server. The terminal is equipped with visual and audio output devices, which are used to provide information visually and audibly. The visual device uses a display to show images and videos in high resolution. The audio output device utilizes speech synthesis technology to provide relevant questions and explanations in natural-sounding voice. This allows the user to receive information visually and audibly, helping them recall past events.
[0288] The user can respond to information presented by the device using voice or text. The device analyzes this response using speech recognition and text analysis technology and sends the results to the server. The server uses a generative AI model to generate the next conversation sequence based on the user's response. For example, if the user says, "This photo brings back memories," the server generates a question such as, "Could you tell me more about your memories of that time?" and presents it through the device.
[0289] A concrete example would be a scenario where a user seeks information related to their childhood memories. The server provides information about school life in the 1950s, and the terminal can then use that information to ask, "What was school life like back then?" An example of a prompt to input into the generating AI model would be, "Generate questions related to school life in the 1950s. Please consider content that will elicit the user's memories and specific experiences."
[0290] In this way, servers, terminals, and users work together to enable two-way communication with dementia patients, thereby reducing the burden on caregivers.
[0291] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0292] Step 1:
[0293] The server retrieves user history information from the database. User IDs and registration information are used as input, and the output is historical data related to the user. Specifically, the server executes SQL queries to extract the necessary data and organizes it by year.
[0294] Step 2:
[0295] The server identifies chronological information based on the acquired historical data. The historical data obtained in step 1 is used as input, and chronological information such as photos and videos is generated as output. The server applies an information extraction algorithm to select the most relevant information.
[0296] Step 3:
[0297] The server sends the identified age-specific information to the terminal. The input is the age-specific information generated in step 2, and the output is the information converted into a format that can be received by the terminal. The server transmits the information using a communication protocol.
[0298] Step 4:
[0299] The terminal presents age-based information received from the server to the user via a display device and an audio output device. The input is the information transmitted in step 3, and the output is the visual and auditory information presented to the user. The terminal displays the information on a high-resolution display and provides questions and explanations in voice using a speech synthesis engine.
[0300] Step 5:
[0301] The user responds to the presented information using voice or text. The user's voice or text data is sent to the device as input. Specifically, the user provides feedback through voice commands or messages.
[0302] Step 6:
[0303] The terminal analyzes the response from the user using speech recognition and text analysis technologies. The input is the user's response in step 5, and the output is the analyzed response content. The terminal uses speech recognition software to convert speech into text and analyzes the meaning with natural language processing algorithms.
[0304] Step 7:
[0305] The server generates the sequence of the next conversation by utilizing the generated AI model based on the analyzed response content. The input is the analysis data obtained in step 6, and the output is the conversation or question to be presented next. The server passes the prompt sentence to the generated AI model to generate a new question.
[0306] Step 8:
[0307] The server sends the generated question to the terminal, and the terminal presents it to the user. The input is the question generated in step 7, and the output is the new information shown to the user. The terminal presents the information to the user again through the display device and the audio output device.
[0308] (Application Example 1)
[0309] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".
[0310] In communication with dementia patients, it is necessary to provide effective means to assist patients in looking back on the past, stabilizing emotions through conversation, and promoting the recall of patients' memories. Also, it is an issue to adjust the dialogue based on the patients' interests and understanding and realize communication suitable for the patients.
[0311] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0312] In this invention, the server includes an information processing device that extracts age-specific information based on the user's history information, a means for transmitting the age-specific information to a display means, a display means for presenting the received age-specific information to the user, a response processing device that analyzes the user's response and generates the next question based on the response content, a means for conducting a dialogue based on the presented information using speech synthesis technology, a means for analyzing the user's voice input using speech recognition technology, and a means for creating prompt sentences using a generation AI model. This makes it possible to realize a system that allows dementia patients to reflect on their past, achieve emotional stability, and enable communication tailored to each individual patient.
[0313] An "information processing device" is a device that extracts age-based information from a user's history and transmits that information to other devices.
[0314] "Information by era" refers to a collection of information related to a specific period, extracted based on the user's history.
[0315] A "display means" is a device for presenting age-based information received from an information processing device to the user.
[0316] A "response processing device" is a device that analyzes responses from users and generates the next question based on the content of those responses.
[0317] "Speech synthesis technology" is a technology that enables conversations based on presented information to be conducted using natural-sounding speech.
[0318] "Speech recognition technology" is a technology that converts voice input from users into text data and analyzes its content.
[0319] A "generative AI model" is a form of artificial intelligence used to create prompt statements in response processing.
[0320] A "prompt sentence" is a sentence created using a generative AI model to guide the user into the next conversation.
[0321] At the heart of the system implementing this invention is an information processing device. This device is responsible for extracting age-specific information based on the user's history and transmitting the relevant information to a terminal. The terminal is equipped with a display device and a speech synthesis device, which presents the age-specific information to the user visually and audibly.
[0322] The server is a computer that manages user responses and analyzes their content. This computer uses speech recognition technology to convert the user's voice input into text and then analyzes that text. It then uses a generative AI model to create prompt sentences and determine the next information or question to present.
[0323] For example, if a user says, "I miss school life in the 1950s," the server sends school-related photos from that era to the user's device and uses speech synthesis technology to generate and present a voice message saying, "Could you tell me a little more about school life back then?"
[0324] The hardware used includes server computers and smartphones or tablets capable of display and speech synthesis. The software utilizes Google Speech Recognition for speech recognition, Google Text-to-Speech for speech synthesis, and SQLite for data management. Software for the generative AI model also runs on the server.
[0325] As a concrete example, here is an example of a prompt input to a generative AI model: "The user feels nostalgic for a park from the 1950s. Please generate the following related question." This prompt allows the generative AI model to generate subsequent questions and information for the user, adjusting the conversation to flow smoothly.
[0326] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0327] Step 1:
[0328] The server retrieves user history information from the database. The user's ID or profile information is used as input. Relevant age-specific information is extracted through database queries, and a list of the extracted information is generated as output.
[0329] Step 2:
[0330] The server sends the extracted age-based information to the terminal. The input here is the age-based information obtained in step 1, and the output is the transfer of the data to the terminal. Once the terminal receives the information, it prepares it for visual display.
[0331] Step 3:
[0332] The device presents age-specific information to the user visually and audibly. The input is age-specific information from a server. The device displays the information on the screen and uses Google Text-to-Speech to provide related questions audibly. The output is the user visually confirming the information and listening to the questions.
[0333] Step 4:
[0334] The user sees or hears the presented information and responds in voice or text. The input is a response based on the user's own memory and experience. The output is a response provided to the device in the form of voice or text.
[0335] Step 5:
[0336] The device converts user responses into text using speech recognition technology. The input is the user's voice, and the data is converted to text using Google Speech Recognition. The output is parseable text.
[0337] Step 6:
[0338] The server analyzes the transcribed response and constructs a prompt using a generative AI model. The input here is the text obtained from step 5. Data analysis determines the user's interests and topics, and then creates a prompt. The output is a prompt containing the next question or information to be presented.
[0339] Step 7:
[0340] The server sends the next question to the terminal based on the generated prompt. The input is the prompt constructed in step 6. The terminal presents this information to the user again and continues the conversation. The output is the provision of new dialogue content.
[0341] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0342] This invention is an information processing system that facilitates communication with dementia patients and enables interaction that takes into account the user's emotions. This system consists of an information processing device, a display device, a response processing device, and an emotion engine. The specific functions and operations of each component are described below.
[0343] First, the server uses the user's history information to extract age-based data. This data includes images and videos based on the user's past experiences and interests, as well as related historical data. The server then sends the extracted information to the terminal.
[0344] The device presents the user with age-specific information received from the server. Specifically, it displays images and videos while simultaneously reading the relevant information aloud using speech synthesis technology. At this stage, the device works in conjunction with an emotion engine to collect the user's facial expressions and voice tone through the camera and microphone.
[0345] The user responds to the information presented by the device. This response is sent to the device as text or voice input. The device receives this and performs analysis in the response processing unit. In the analysis process, emphasis is placed not only on the user's response but also on the collected sentiment data.
[0346] The emotion engine determines the user's emotional state based on user input and collected emotional data. For example, if the user makes a nostalgic expression, the emotion engine will determine this to be a positive emotion. The server receives feedback from the emotion engine and adjusts the next questions and information presentation accordingly.
[0347] As a concrete example, consider a scenario where the user feels nostalgic about school life in the 1950s, as presented. If the user shows a satisfied expression, the emotion engine recognizes this as a positive response. Based on this, the server assumes the user wants to continue the conversation and generates a question such as, "Please tell me about your memories with friends from that time." The terminal then presents this question to the user, allowing for a smooth continuation of the dialogue.
[0348] In this way, the present invention realizes interaction that reflects the user's emotions in real time, providing more personalized care. Through this system, caregivers can maintain the mental health of dementia patients and reduce the burden of daily care.
[0349] The following describes the processing flow.
[0350] Step 1:
[0351] The server extracts relevant, age-specific information from the database based on the user's history. This information includes images, videos, and audio files that are likely to be of interest to the user.
[0352] Step 2:
[0353] The server packages the extracted age-based information and related supplementary data and sends it to the terminal. The terminal then prepares to effectively present the received information to the user.
[0354] Step 3:
[0355] The terminal displays information received from the server to the user and simultaneously provides information via audio. It is optimized so that the user can receive this information both visually and aurally.
[0356] Step 4:
[0357] The device activates an emotion engine and acquires data from the camera and microphone to monitor the user's facial expressions and tone of voice. This data is used to recognize the user's emotional state in real time.
[0358] Step 5:
[0359] The user reacts to the presented content and provides verbal responses. The user's responses are sent to the device via voice input.
[0360] Step 6:
[0361] The terminal converts the user's voice into text, which is then analyzed by a response processing unit. Simultaneously, the emotion engine evaluates the user's emotions and determines whether they are positive, negative, or neutral.
[0362] Step 7:
[0363] The server receives the analysis results and sentiment evaluation, and generates information and questions for the next step. The generated content is optimized for the user's interests and emotional state.
[0364] Step 8:
[0365] The terminal presents the user with new information and questions obtained from the server, continuing the conversation. The terminal strives to help the user relax and continue talking by providing information at the appropriate time and in the right tone.
[0366] Through the above process, the system makes interactions with users more natural and emotionally sensitive, providing an effective and comfortable communication environment.
[0367] (Example 2)
[0368] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0369] A challenge in interacting with dementia patients is the lack of interactive information provision that takes into account the user's emotions. Conventional systems have difficulty adjusting conversations to reflect the user's emotional state in real time, resulting in limited provision of individualized care.
[0370] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0371] In this invention, the server includes means for an information processing device to extract age-specific data based on the user's history data, means for a terminal to collect the user's facial expressions and voice tone through a camera and microphone, and means for an emotion engine to determine the user's emotional state and adjust the conversation accordingly. This enables smooth dialogue that takes the user's emotional state into consideration in real time.
[0372] An "information processing device" is a device that analyzes data and extracts and generates necessary information based on a specific purpose.
[0373] "Age-based data" refers to data that includes images, videos, and historical information related to a specific period in the past, based on the user's history and interests.
[0374] A "display device" is a device that presents received information to a user visually or audibly.
[0375] "Speech synthesis technology" is a technology that converts text data into speech and reads it aloud.
[0376] A "terminal" is a device that collects user input and emotional states and presents that information.
[0377] A "camera" is an optical device that captures images or videos and transmits them to a display or other device.
[0378] A "microphone" is a device that receives sound and converts it into an electrical signal.
[0379] An "emotion engine" refers to technology that analyzes a user's facial expressions and voice tone to determine their emotional state.
[0380] A "response processing device" is a device used to analyze user responses and generate the content of the next dialogue.
[0381] A "generative AI model" is an artificial intelligence technology that generates new text or questions based on input prompts.
[0382] This invention is an information processing system that facilitates communication with dementia patients and enables interaction that takes into account the user's emotions. The system mainly consists of a server, a terminal, a display device, a response processing device, and an emotion engine.
[0383] The server uses a database to analyze user history data and extract data by age group. This information includes images, videos, and historical data related to the user's past experiences and interests. The server uses a database management system.
[0384] When the terminal receives age-specific data from the server, it visually presents images and videos using a display device and reads out related information using speech synthesis technology. A general-purpose speech synthesis engine is used for this process. The terminal also records the user's facial expressions and voice tone using a camera and microphone, and an emotion engine analyzes this data.
[0385] The user can respond to the presented information, and this response is sent to the terminal as text or voice input. The terminal immediately sends this input data to the response processing unit.
[0386] The emotion engine analyzes the user's facial expressions and tone of voice to evaluate their emotional state. If the user smiles or displays a satisfied expression, it is recognized as a positive emotion. Based on this evaluation, the server uses a generative AI model to generate the next dialogue, taking the user's emotional state into consideration.
[0387] As a concrete example, consider a case where a user makes a nostalgic expression when presented with data about school life in the 1950s. The emotion engine analyzes this as a positive reaction and provides the result to the server. The server uses a generative AI model and prompt text to generate the next question, such as "Please tell us about your memories with friends from that time," and presents it to the user through the terminal.
[0388] An example of a prompt message is: "Generate a new question based on the user's interests."
[0389] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0390] Step 1:
[0391] The server retrieves user history data from the database. The input is the user's ID, and the output is extracted historical data related to past experiences and interests. Database management software is used to process the historical data and sort it into age-specific data. Specifically, SQL queries are used to extract data categorized by age.
[0392] Step 2:
[0393] The server sends the extracted age-based data to the terminal. The input is the extracted age-based data, and the output is the transmission of data to the terminal. Network communication protocols are used to accurately transmit the data to the terminal.
[0394] Step 3:
[0395] The terminal analyzes the received age-specific data, displays images and videos on a display device, and reads the information aloud using speech synthesis technology. The input is age-specific data received from the server, and the output is a visual and auditory presentation to the user. A speech synthesis engine is used to convert text into speech, which is then presented using the display and speaker.
[0396] Step 4:
[0397] The user reacts to the information presented. Input is the information presented by the device, and output is the information sent to the device as the user's facial expressions, voice responses, or text input.
[0398] Step 5:
[0399] The device collects user responses through its camera and microphone. Input is the user's facial expressions and voice, and output is the transmission of this data to a response processing unit. Image analysis and speech recognition technologies are used to structure the collected data and send it to the emotion engine.
[0400] Step 6:
[0401] The emotion engine analyzes the user's facial expressions and voice tone to determine their emotional state. The input is the user's emotional data, and the output is an evaluation of that emotional state. An analysis algorithm is used to classify emotions as positive, negative, or neutral.
[0402] Step 7:
[0403] The server uses a generative AI model to generate the next dialogue based on feedback from the emotion engine. The input consists of the emotion engine's evaluation results, the current conversation context, and prompts for the generative AI model; the output is the next question or information to be presented. The generative AI model uses natural language generation techniques to create the new dialogue.
[0404] Step 8:
[0405] The terminal presents the user with new dialogue content from the server. The input is the next question or information generated by the server, and the output is the visual or auditory presentation of this information to the user. This iterative process ensures a smooth and continuous dialogue with the user.
[0406] (Application Example 2)
[0407] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0408] In interacting with dementia patients, the challenge lies in achieving smooth interactions that reflect the user's emotions in real time, and providing individualized care based on the user's past experiences and interests. Furthermore, there is a need to build a system that reduces the burden on caregivers and makes daily care more effective.
[0409] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0410] In this invention, the server includes an information processing device that extracts age-specific information based on the user's history information, a means for transmitting the age-specific information to a display device, and a means for the display device to present the received age-specific information to the user. This makes it possible to provide personalized care through interaction that takes into account the user's emotions, thereby reducing the burden on caregivers.
[0411] An "information processing device" is an electronic device that extracts age-based information based on the user's history and transmits it to a display device.
[0412] "Information by age group" refers to data organized by age group based on users' past experiences and interests.
[0413] A "display device" is a device that visually presents age-based information received from an information processing device to the user.
[0414] A "response processing device" is a device that analyzes responses from users, generates the next question, and adjusts the conversation accordingly.
[0415] An "emotion engine" is software or hardware that collects a user's facial expressions or voice tone and determines their emotional state.
[0416] "Emotional feedback" refers to information used to reflect the results of the emotional engine's assessment of the emotional state in the interaction.
[0417] "Means of adjusting interaction" refer to a system that appropriately modifies conversations and information presentations based on emotional feedback from an emotion engine.
[0418] A "smart device" is a portable electronic device used to improve visual or auditory interaction, and it has the ability to run applications.
[0419] This invention is a system for facilitating smooth communication with dementia patients. The system primarily consists of a server, a terminal, and user interaction. The server is responsible for extracting age-specific information based on the user's history and transmitting it to the terminal. Age-specific information is managed in the form of images, videos, and related historical data.
[0420] The terminal presents age-specific information received from the server to the user via a display device. Specifically, it displays images and videos on the screen while simultaneously using speech synthesis technology to read the information aloud and provide it to the user as audio information. The terminal also works in conjunction with an emotion engine to collect the user's facial expressions and voice tone through the camera and microphone. This information is used to understand the user's emotional state. Based on the collected data, the emotion engine analyzes emotions in real time and determines positive and negative responses.
[0421] The user responds to the presented information via voice, and this response is analyzed on the device. Based on the analysis results and emotional feedback from the emotion engine, the server adjusts the next questions and information presentation. For example, if the user smiles, the server can generate questions that evoke positive memories, such as, "Do you remember the first place you traveled to?"
[0422] In implementing the system of this invention, hardware such as smart glasses (e.g., a general portable visual device) or a tablet (e.g., a general personal information terminal) can be used. Software such as a face capture library (e.g., OpenCV), a speech recognition API (e.g., Google Cloud Speech-to-Text), and an emotion analysis engine (e.g., Affectiva) can be applied. With such a system configuration, it becomes possible to maintain the mental health of dementia patients through individualized care while reducing the burden on caregivers.
[0423] As an example of a prompt message, it can be used in server design in the form of, "When the user smiles, ask them verbally, 'Do you remember the first place you traveled to?'"
[0424] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0425] Step 1:
[0426] The server extracts age-specific information based on the user's history. The input is the user's history data, which is used to search for age-specific information based on past experiences and interests, and select relevant information from the database. The output is age-specific information data.
[0427] Step 2:
[0428] The server sends the extracted age-based information to the terminal. The input is the age-based information data generated in step 1, which is sent to the terminal via the network protocol. The output is the age-based information data received by the terminal.
[0429] Step 3:
[0430] The terminal presents age-based information received to the user via a display device. The input is age-based information data transmitted from the server, which is converted into image or video formats for display on the screen. Furthermore, a speech synthesis API is used to convert the visual information into speech. The output is audiovisual information for the user to view.
[0431] Step 4:
[0432] The device uses a camera and microphone to collect the user's facial expressions and voice tone in real time. The input is the user's audiovisual reactions, which are acquired as camera video and microphone audio data. The output is emotion data that is processed in real time.
[0433] Step 5:
[0434] The emotion engine determines the user's emotional state based on collected facial expressions and voice tone. The input is emotional data supplied from the device, which is analyzed using a machine learning algorithm. The output is the user's emotional state as determined by the emotion engine.
[0435] Step 6:
[0436] The response processing unit analyzes the user's response and generates the next question, taking into account the emotional state from the emotion engine. The input consists of response data and emotional state information provided by the user, either as speech or text, which are analyzed using natural language processing and an AI model. The output is the data for the next question presented.
[0437] Step 7:
[0438] The server sends the generated question data to the terminal and adjusts the interaction. The input is the question data generated in step 6, which is sent via network communication to the terminal. The output is a new question presented to the user.
[0439] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0440] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0441] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0442] [Third Embodiment]
[0443] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0444] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0445] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0446] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0447] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0448] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0449] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0450] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0451] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0452] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0453] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0454] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0455] This invention is an information processing system that enables effective communication with dementia patients. The specific operation of each component is described below.
[0456] First, the server is responsible for managing user history information and extracting age-specific information based on this data. The server refers to databases containing user registration information, past activity history, and preferences to identify information, images, and videos related to the user's time period.
[0457] The terminal functions as an interface for providing users with information received from the server. The terminal visually displays chronological information provided by the server and can provide related questions and explanations verbally using speech synthesis technology. This helps users recall past events by viewing and listening to the information.
[0458] The user responds to the presented information and questions using voice or text. The terminal processes these responses using speech recognition and text analysis technologies, and analyzes the content of the responses. Based on the analyzed responses, the server generates the next conversation sequence. For example, if the user responds, "This photo brings back memories. It reminds me of a trip I took when I was younger," the server creates an additional question such as, "Could you tell me more about that trip?" and presents it to the user through the terminal.
[0459] As a concrete example, consider a scenario where a user begins to talk about childhood memories. The server provides the terminal with information related to schools and games in the 1950s, and the terminal presents this information to the user and asks, "What was school life like back then?" Upon receiving the user's response, the terminal sends that information back to the server and generates further information and questions based on the user's response. In this way, the present invention realizes two-way communication in which users can actively participate.
[0460] This embodiment allows dementia patients to safely and comfortably reflect on their past, reducing the burden on caregivers while stabilizing the patient's emotions.
[0461] The following describes the processing flow.
[0462] Step 1:
[0463] The server searches the database based on the user's profile information and extracts relevant age-specific information. This includes the user's date of birth, work history, and favorite music.
[0464] Step 2:
[0465] The server packages the extracted age-based information and sends it to the terminal. The transmitted information includes images, videos, audio files, and related text data.
[0466] Step 3:
[0467] The terminal displays information received from the server on the user's screen. The display method is adjusted so that visual content is presented in an appropriate format.
[0468] Step 4:
[0469] The device uses speech synthesis technology based on received data to present the user with relevant questions in voice. For example, a question like, "Do you remember anything about this era?"
[0470] Step 5:
[0471] The user responds to the device via voice or text. The user can also speak about their memories and impressions of the information presented.
[0472] Step 6:
[0473] The device converts the user's voice responses into text using recognition technology and analyzes the content. Important keywords and themes are then extracted from the analysis results.
[0474] Step 7:
[0475] The server generates the next information and questions based on the analysis results sent from the terminal. The server then appropriately adjusts the received data based on the user's interests and responses.
[0476] Step 8:
[0477] The terminal presents new information and questions from the server, continuing the conversation. The terminal selects topics likely to interest the user, deepening the conversation and further retrieving the user's memory.
[0478] In this way, the system enables meaningful communication for dementia patients through interaction with the user.
[0479] (Example 1)
[0480] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0481] Communicating with dementia patients in modern society presents numerous challenges. In particular, there is a need for effective methods to elicit past memories and stabilize patients' emotions. Furthermore, it is necessary to provide patients with appropriate information while reducing the burden on caregivers.
[0482] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0483] In this invention, the server includes means for extracting age-specific information based on the user's history information, means for transmitting age-specific information to a communication device, and means for analyzing the user's response and generating the next conversation sequence using a generative AI model. This enables dementia patients to reflect on their past while engaging in interactive communication tailored to their needs.
[0484] An "information processing device" is a computer that extracts age-related information relevant to a user based on historical data and performs necessary processing.
[0485] "Information by era" refers to information such as events, photographs, and videos related to a specific period, and is data used to evoke memories in the user.
[0486] A "communication device" is a device that plays the role of sending and receiving information between an information processing device and a display device.
[0487] "Display means" refers to a device that visually presents received information to the user, and includes displays and the like.
[0488] A "sound output means" is a device that provides information in sound using speech synthesis technology.
[0489] A "response processing device" is a device that analyzes the user's voice or text responses and generates the next question based on that content.
[0490] "Speech recognition means" refers to a technology or device for converting speech input into text data.
[0491] "Text analysis technology" refers to the technology used to analyze text data and understand its meaning.
[0492] A "generative AI model" is an algorithm or technology used to generate the next conversation sequence or question based on the response content.
[0493] A "conversation sequence" is a series of questions and answers designed to facilitate smooth interaction with the user.
[0494] This invention is an information processing system that enables effective communication with dementia patients, and is realized through the server, terminal, and user each fulfilling their respective roles.
[0495] First, the server manages the user's history information. Specifically, the server uses a database management system to access a database containing the user's registration information, past activity history, and preferences. This allows the server to extract age-specific information related to the user, for example, identifying photos and videos from a specific period. This information is then transmitted to the terminal via a communication device for further processing.
[0496] The terminal presents the user with age-based information received from the server. The terminal is equipped with visual and audio output devices, which are used to provide information visually and audibly. The visual device uses a display to show images and videos in high resolution. The audio output device utilizes speech synthesis technology to provide relevant questions and explanations in natural-sounding voice. This allows the user to receive information visually and audibly, helping them recall past events.
[0497] The user can respond to information presented by the device using voice or text. The device analyzes this response using speech recognition and text analysis technology and sends the results to the server. The server uses a generative AI model to generate the next conversation sequence based on the user's response. For example, if the user says, "This photo brings back memories," the server generates a question such as, "Could you tell me more about your memories of that time?" and presents it through the device.
[0498] A concrete example would be a scenario where a user seeks information related to their childhood memories. The server provides information about school life in the 1950s, and the terminal can then use that information to ask, "What was school life like back then?" An example of a prompt to input into the generating AI model would be, "Generate questions related to school life in the 1950s. Please consider content that will elicit the user's memories and specific experiences."
[0499] In this way, servers, terminals, and users work together to enable two-way communication with dementia patients, thereby reducing the burden on caregivers.
[0500] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0501] Step 1:
[0502] The server retrieves user history information from the database. User IDs and registration information are used as input, and the output is historical data related to the user. Specifically, the server executes SQL queries to extract the necessary data and organizes it by year.
[0503] Step 2:
[0504] The server identifies chronological information based on the acquired historical data. The historical data obtained in step 1 is used as input, and chronological information such as photos and videos is generated as output. The server applies an information extraction algorithm to select the most relevant information.
[0505] Step 3:
[0506] The server sends the identified age-specific information to the terminal. The input is the age-specific information generated in step 2, and the output is the information converted into a format that can be received by the terminal. The server transmits the information using a communication protocol.
[0507] Step 4:
[0508] The terminal presents age-based information received from the server to the user via a display device and an audio output device. The input is the information transmitted in step 3, and the output is the visual and auditory information presented to the user. The terminal displays the information on a high-resolution display and provides questions and explanations in voice using a speech synthesis engine.
[0509] Step 5:
[0510] The user responds to the presented information using voice or text. The user's voice or text data is sent to the device as input. Specifically, the user provides feedback through voice commands or messages.
[0511] Step 6:
[0512] The device analyzes the user's response using speech recognition and text analysis technologies. The input is the user's response in step 5, and the output is the analyzed response. The device converts the speech to text using speech recognition software and analyzes its meaning using a natural language processing algorithm.
[0513] Step 7:
[0514] The server uses a generative AI model based on the analyzed response to generate the next conversation sequence. The input is the analyzed data obtained in step 6, and the output is the conversation or questions presented next. The server passes a prompt to the generative AI model, which generates a new question.
[0515] Step 8:
[0516] The server sends the generated question to the terminal, which then presents it to the user. The input is the question generated in step 7, and the output is the new information shown to the user. The terminal then presents the information to the user again through the display device and sound output device.
[0517] (Application Example 1)
[0518] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0519] In communicating with dementia patients, it is necessary to support them in reflecting on the past, stabilize their emotions through conversation, and provide effective means to promote memory recall. Furthermore, a challenge is to adjust the dialogue based on the patient's interests and understanding to achieve communication that is appropriate for them.
[0520] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0521] In this invention, the server includes an information processing device that extracts age-specific information based on the user's history information, a means for transmitting the age-specific information to a display means, a display means for presenting the received age-specific information to the user, a response processing device that analyzes the user's response and generates the next question based on the response content, a means for conducting a dialogue based on the presented information using speech synthesis technology, a means for analyzing the user's voice input using speech recognition technology, and a means for creating prompt sentences using a generation AI model. This makes it possible to realize a system that allows dementia patients to reflect on their past, achieve emotional stability, and enable communication tailored to each individual patient.
[0522] An "information processing device" is a device that extracts age-based information from a user's history and transmits that information to other devices.
[0523] "Information by era" refers to a collection of information related to a specific period, extracted based on the user's history.
[0524] A "display means" is a device for presenting age-based information received from an information processing device to the user.
[0525] A "response processing device" is a device that analyzes responses from users and generates the next question based on the content of those responses.
[0526] "Speech synthesis technology" is a technology that enables conversations based on presented information to be conducted using natural-sounding speech.
[0527] "Speech recognition technology" is a technology that converts voice input from users into text data and analyzes its content.
[0528] A "generative AI model" is a form of artificial intelligence used to create prompt statements in response processing.
[0529] A "prompt sentence" is a sentence created using a generative AI model to guide the user into the next conversation.
[0530] At the heart of the system implementing this invention is an information processing device. This device is responsible for extracting age-specific information based on the user's history and transmitting the relevant information to a terminal. The terminal is equipped with a display device and a speech synthesis device, which presents the age-specific information to the user visually and audibly.
[0531] The server is a computer that manages user responses and analyzes their content. This computer uses speech recognition technology to convert the user's voice input into text and then analyzes that text. It then uses a generative AI model to create prompt sentences and determine the next information or question to present.
[0532] For example, if a user says, "I miss school life in the 1950s," the server sends school-related photos from that era to the user's device and uses speech synthesis technology to generate and present a voice message saying, "Could you tell me a little more about school life back then?"
[0533] The hardware used includes server computers and smartphones or tablets capable of display and speech synthesis. The software utilizes Google Speech Recognition for speech recognition, Google Text-to-Speech for speech synthesis, and SQLite for data management. Software for the generative AI model also runs on the server.
[0534] As a concrete example, here is an example of a prompt input to a generative AI model: "The user feels nostalgic for a park from the 1950s. Please generate the following related question." This prompt allows the generative AI model to generate subsequent questions and information for the user, adjusting the conversation to flow smoothly.
[0535] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0536] Step 1:
[0537] The server retrieves user history information from the database. The user's ID or profile information is used as input. Relevant age-specific information is extracted through database queries, and a list of the extracted information is generated as output.
[0538] Step 2:
[0539] The server sends the extracted age-based information to the terminal. The input here is the age-based information obtained in step 1, and the output is the transfer of the data to the terminal. Once the terminal receives the information, it prepares it for visual display.
[0540] Step 3:
[0541] The device presents age-specific information to the user visually and audibly. The input is age-specific information from a server. The device displays the information on the screen and uses Google Text-to-Speech to provide related questions audibly. The output is the user visually confirming the information and listening to the questions.
[0542] Step 4:
[0543] The user sees or hears the presented information and responds in voice or text. The input is a response based on the user's own memory and experience. The output is a response provided to the device in the form of voice or text.
[0544] Step 5:
[0545] The device converts user responses into text using speech recognition technology. The input is the user's voice, and the data is converted to text using Google Speech Recognition. The output is parseable text.
[0546] Step 6:
[0547] The server analyzes the transcribed response and constructs a prompt using a generative AI model. The input here is the text obtained from step 5. Data analysis determines the user's interests and topics, and then creates a prompt. The output is a prompt containing the next question or information to be presented.
[0548] Step 7:
[0549] The server sends the next question to the terminal based on the generated prompt. The input is the prompt constructed in step 6. The terminal presents this information to the user again and continues the conversation. The output is the provision of new dialogue content.
[0550] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0551] This invention is an information processing system that facilitates communication with dementia patients and enables interaction that takes into account the user's emotions. This system consists of an information processing device, a display device, a response processing device, and an emotion engine. The specific functions and operations of each component are described below.
[0552] First, the server uses the user's history information to extract age-based data. This data includes images and videos based on the user's past experiences and interests, as well as related historical data. The server then sends the extracted information to the terminal.
[0553] The device presents the user with age-specific information received from the server. Specifically, it displays images and videos while simultaneously reading the relevant information aloud using speech synthesis technology. At this stage, the device works in conjunction with an emotion engine to collect the user's facial expressions and voice tone through the camera and microphone.
[0554] The user responds to the information presented by the device. This response is sent to the device as text or voice input. The device receives this and performs analysis in the response processing unit. In the analysis process, emphasis is placed not only on the user's response but also on the collected sentiment data.
[0555] The emotion engine determines the user's emotional state based on user input and collected emotional data. For example, if the user makes a nostalgic expression, the emotion engine will determine this to be a positive emotion. The server receives feedback from the emotion engine and adjusts the next questions and information presentation accordingly.
[0556] As a concrete example, consider a scenario where the user feels nostalgic about school life in the 1950s, as presented. If the user shows a satisfied expression, the emotion engine recognizes this as a positive response. Based on this, the server assumes the user wants to continue the conversation and generates a question such as, "Please tell me about your memories with friends from that time." The terminal then presents this question to the user, allowing for a smooth continuation of the dialogue.
[0557] In this way, the present invention realizes interaction that reflects the user's emotions in real time, providing more personalized care. Through this system, caregivers can maintain the mental health of dementia patients and reduce the burden of daily care.
[0558] The following describes the processing flow.
[0559] Step 1:
[0560] The server extracts relevant, age-specific information from the database based on the user's history. This information includes images, videos, and audio files that are likely to be of interest to the user.
[0561] Step 2:
[0562] The server packages the extracted age-based information and related supplementary data and sends it to the terminal. The terminal then prepares to effectively present the received information to the user.
[0563] Step 3:
[0564] The terminal displays information received from the server to the user and simultaneously provides information via audio. It is optimized so that the user can receive this information both visually and aurally.
[0565] Step 4:
[0566] The device activates an emotion engine and acquires data from the camera and microphone to monitor the user's facial expressions and tone of voice. This data is used to recognize the user's emotional state in real time.
[0567] Step 5:
[0568] The user reacts to the presented content and provides verbal responses. The user's responses are sent to the device via voice input.
[0569] Step 6:
[0570] The terminal converts the user's voice into text, which is then analyzed by a response processing unit. Simultaneously, the emotion engine evaluates the user's emotions and determines whether they are positive, negative, or neutral.
[0571] Step 7:
[0572] The server receives the analysis results and sentiment evaluation, and generates information and questions for the next step. The generated content is optimized for the user's interests and emotional state.
[0573] Step 8:
[0574] The terminal presents the user with new information and questions obtained from the server, continuing the conversation. The terminal strives to help the user relax and continue talking by providing information at the appropriate time and in the right tone.
[0575] Through the above process, the system makes interactions with users more natural and emotionally sensitive, providing an effective and comfortable communication environment.
[0576] (Example 2)
[0577] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0578] A challenge in interacting with dementia patients is the lack of interactive information provision that takes into account the user's emotions. Conventional systems have difficulty adjusting conversations to reflect the user's emotional state in real time, resulting in limited provision of individualized care.
[0579] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0580] In this invention, the server includes means for an information processing device to extract age-specific data based on the user's history data, means for a terminal to collect the user's facial expressions and voice tone through a camera and microphone, and means for an emotion engine to determine the user's emotional state and adjust the conversation accordingly. This enables smooth dialogue that takes the user's emotional state into consideration in real time.
[0581] An "information processing device" is a device that analyzes data and extracts and generates necessary information based on a specific purpose.
[0582] "Age-based data" refers to data that includes images, videos, and historical information related to a specific period in the past, based on the user's history and interests.
[0583] A "display device" is a device that presents received information to a user visually or audibly.
[0584] "Speech synthesis technology" is a technology that converts text data into speech and reads it aloud.
[0585] A "terminal" is a device that collects user input and emotional states and presents that information.
[0586] A "camera" is an optical device that captures images or videos and transmits them to a display or other device.
[0587] A "microphone" is a device that receives sound and converts it into an electrical signal.
[0588] An "emotion engine" refers to technology that analyzes a user's facial expressions and voice tone to determine their emotional state.
[0589] A "response processing device" is a device used to analyze user responses and generate the content of the next dialogue.
[0590] A "generative AI model" is an artificial intelligence technology that generates new text or questions based on input prompts.
[0591] This invention is an information processing system that facilitates communication with dementia patients and enables interaction that takes into account the user's emotions. The system mainly consists of a server, a terminal, a display device, a response processing device, and an emotion engine.
[0592] The server uses a database to analyze user history data and extract data by age group. This information includes images, videos, and historical data related to the user's past experiences and interests. The server uses a database management system.
[0593] When the terminal receives age-specific data from the server, it visually presents images and videos using a display device and reads out related information using speech synthesis technology. A general-purpose speech synthesis engine is used for this process. The terminal also records the user's facial expressions and voice tone using a camera and microphone, and an emotion engine analyzes this data.
[0594] The user can respond to the presented information, and this response is sent to the terminal as text or voice input. The terminal immediately sends this input data to the response processing unit.
[0595] The emotion engine analyzes the user's facial expressions and tone of voice to evaluate their emotional state. If the user smiles or displays a satisfied expression, it is recognized as a positive emotion. Based on this evaluation, the server uses a generative AI model to generate the next dialogue, taking the user's emotional state into consideration.
[0596] As a concrete example, consider a case where a user makes a nostalgic expression when presented with data about school life in the 1950s. The emotion engine analyzes this as a positive reaction and provides the result to the server. The server uses a generative AI model and prompt text to generate the next question, such as "Please tell us about your memories with friends from that time," and presents it to the user through the terminal.
[0597] An example of a prompt message is: "Generate a new question based on the user's interests."
[0598] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0599] Step 1:
[0600] The server retrieves user history data from the database. The input is the user's ID, and the output is extracted historical data related to past experiences and interests. Database management software is used to process the historical data and sort it into age-specific data. Specifically, SQL queries are used to extract data categorized by age.
[0601] Step 2:
[0602] The server sends the extracted age-based data to the terminal. The input is the extracted age-based data, and the output is the transmission of data to the terminal. Network communication protocols are used to accurately transmit the data to the terminal.
[0603] Step 3:
[0604] The terminal analyzes the received age-specific data, displays images and videos on a display device, and reads the information aloud using speech synthesis technology. The input is age-specific data received from the server, and the output is a visual and auditory presentation to the user. A speech synthesis engine is used to convert text into speech, which is then presented using the display and speaker.
[0605] Step 4:
[0606] The user reacts to the information presented. Input is the information presented by the device, and output is the information sent to the device as the user's facial expressions, voice responses, or text input.
[0607] Step 5:
[0608] The device collects user responses through its camera and microphone. Input is the user's facial expressions and voice, and output is the transmission of this data to a response processing unit. Image analysis and speech recognition technologies are used to structure the collected data and send it to the emotion engine.
[0609] Step 6:
[0610] The emotion engine analyzes the user's facial expressions and voice tone to determine their emotional state. The input is the user's emotional data, and the output is an evaluation of that emotional state. An analysis algorithm is used to classify emotions as positive, negative, or neutral.
[0611] Step 7:
[0612] The server uses a generative AI model to generate the next dialogue based on feedback from the emotion engine. The input consists of the emotion engine's evaluation results, the current conversation context, and prompts for the generative AI model; the output is the next question or information to be presented. The generative AI model uses natural language generation techniques to create the new dialogue.
[0613] Step 8:
[0614] The terminal presents the user with new dialogue content from the server. The input is the next question or information generated by the server, and the output is the visual or auditory presentation of this information to the user. This iterative process ensures a smooth and continuous dialogue with the user.
[0615] (Application Example 2)
[0616] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0617] In interacting with dementia patients, the challenge lies in achieving smooth interactions that reflect the user's emotions in real time, and providing individualized care based on the user's past experiences and interests. Furthermore, there is a need to build a system that reduces the burden on caregivers and makes daily care more effective.
[0618] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0619] In this invention, the server includes an information processing device that extracts age-specific information based on the user's history information, a means for transmitting the age-specific information to a display device, and a means for the display device to present the received age-specific information to the user. This makes it possible to provide personalized care through interaction that takes into account the user's emotions, thereby reducing the burden on caregivers.
[0620] An "information processing device" is an electronic device that extracts age-based information based on the user's history and transmits it to a display device.
[0621] "Information by age group" refers to data organized by age group based on users' past experiences and interests.
[0622] A "display device" is a device that visually presents age-based information received from an information processing device to the user.
[0623] A "response processing device" is a device that analyzes responses from users, generates the next question, and adjusts the conversation accordingly.
[0624] An "emotion engine" is software or hardware that collects a user's facial expressions or voice tone and determines their emotional state.
[0625] "Emotional feedback" refers to information used to reflect the results of the emotional engine's assessment of the emotional state in the interaction.
[0626] "Means of adjusting interaction" refer to a system that appropriately modifies conversations and information presentations based on emotional feedback from an emotion engine.
[0627] A "smart device" is a portable electronic device used to improve visual or auditory interaction, and it has the ability to run applications.
[0628] This invention is a system for facilitating smooth communication with dementia patients. The system primarily consists of a server, a terminal, and user interaction. The server is responsible for extracting age-specific information based on the user's history and transmitting it to the terminal. Age-specific information is managed in the form of images, videos, and related historical data.
[0629] The terminal presents age-specific information received from the server to the user via a display device. Specifically, it displays images and videos on the screen while simultaneously using speech synthesis technology to read the information aloud and provide it to the user as audio information. The terminal also works in conjunction with an emotion engine to collect the user's facial expressions and voice tone through the camera and microphone. This information is used to understand the user's emotional state. Based on the collected data, the emotion engine analyzes emotions in real time and determines positive and negative responses.
[0630] The user responds to the presented information via voice, and this response is analyzed on the device. Based on the analysis results and emotional feedback from the emotion engine, the server adjusts the next questions and information presentation. For example, if the user smiles, the server can generate questions that evoke positive memories, such as, "Do you remember the first place you traveled to?"
[0631] In implementing the system of this invention, hardware such as smart glasses (e.g., a general portable visual device) or a tablet (e.g., a general personal information terminal) can be used. Software such as a face capture library (e.g., OpenCV), a speech recognition API (e.g., Google Cloud Speech-to-Text), and an emotion analysis engine (e.g., Affectiva) can be applied. With such a system configuration, it becomes possible to maintain the mental health of dementia patients through individualized care while reducing the burden on caregivers.
[0632] As an example of a prompt message, it can be used in server design in the form of, "When the user smiles, ask them verbally, 'Do you remember the first place you traveled to?'"
[0633] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0634] Step 1:
[0635] The server extracts age-specific information based on the user's history. The input is the user's history data, which is used to search for age-specific information based on past experiences and interests, and select relevant information from the database. The output is age-specific information data.
[0636] Step 2:
[0637] The server sends the extracted age-based information to the terminal. The input is the age-based information data generated in step 1, which is sent to the terminal via the network protocol. The output is the age-based information data received by the terminal.
[0638] Step 3:
[0639] The terminal presents age-based information received to the user via a display device. The input is age-based information data transmitted from the server, which is converted into image or video formats for display on the screen. Furthermore, a speech synthesis API is used to convert the visual information into speech. The output is audiovisual information for the user to view.
[0640] Step 4:
[0641] The device uses a camera and microphone to collect the user's facial expressions and voice tone in real time. The input is the user's audiovisual reactions, which are acquired as camera video and microphone audio data. The output is emotion data that is processed in real time.
[0642] Step 5:
[0643] The emotion engine determines the user's emotional state based on collected facial expressions and voice tone. The input is emotional data supplied from the device, which is analyzed using a machine learning algorithm. The output is the user's emotional state as determined by the emotion engine.
[0644] Step 6:
[0645] The response processing unit analyzes the user's response and generates the next question, taking into account the emotional state from the emotion engine. The input consists of response data and emotional state information provided by the user, either as speech or text, which are analyzed using natural language processing and an AI model. The output is the data for the next question presented.
[0646] Step 7:
[0647] The server sends the generated question data to the terminal and adjusts the interaction. The input is the question data generated in step 6, which is sent via network communication to the terminal. The output is a new question presented to the user.
[0648] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0649] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0650] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0651] [Fourth Embodiment]
[0652] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0653] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0654] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0655] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0656] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0657] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0658] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0659] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0660] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0661] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0662] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0663] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0664] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0665] This invention is an information processing system that enables effective communication with dementia patients. The specific operation of each component is described below.
[0666] First, the server is responsible for managing user history information and extracting age-specific information based on this data. The server refers to databases containing user registration information, past activity history, and preferences to identify information, images, and videos related to the user's time period.
[0667] The terminal functions as an interface for providing users with information received from the server. The terminal visually displays chronological information provided by the server and can provide related questions and explanations verbally using speech synthesis technology. This helps users recall past events by viewing and listening to the information.
[0668] The user responds to the presented information and questions using voice or text. The terminal processes these responses using speech recognition and text analysis technologies, and analyzes the content of the responses. Based on the analyzed responses, the server generates the next conversation sequence. For example, if the user responds, "This photo brings back memories. It reminds me of a trip I took when I was younger," the server creates an additional question such as, "Could you tell me more about that trip?" and presents it to the user through the terminal.
[0669] As a concrete example, consider a scenario where a user begins to talk about childhood memories. The server provides the terminal with information related to schools and games in the 1950s, and the terminal presents this information to the user and asks, "What was school life like back then?" Upon receiving the user's response, the terminal sends that information back to the server and generates further information and questions based on the user's response. In this way, the present invention realizes two-way communication in which users can actively participate.
[0670] This embodiment allows dementia patients to safely and comfortably reflect on their past, reducing the burden on caregivers while stabilizing the patient's emotions.
[0671] The following describes the processing flow.
[0672] Step 1:
[0673] The server searches the database based on the user's profile information and extracts relevant age-specific information. This includes the user's date of birth, work history, and favorite music.
[0674] Step 2:
[0675] The server packages the extracted age-based information and sends it to the terminal. The transmitted information includes images, videos, audio files, and related text data.
[0676] Step 3:
[0677] The terminal displays information received from the server on the user's screen. The display method is adjusted so that visual content is presented in an appropriate format.
[0678] Step 4:
[0679] The device uses speech synthesis technology based on received data to present the user with relevant questions in voice. For example, a question like, "Do you remember anything about this era?"
[0680] Step 5:
[0681] The user responds to the device via voice or text. The user can also speak about their memories and impressions of the information presented.
[0682] Step 6:
[0683] The device converts the user's voice responses into text using recognition technology and analyzes the content. Important keywords and themes are then extracted from the analysis results.
[0684] Step 7:
[0685] The server generates the next information and questions based on the analysis results sent from the terminal. The server then appropriately adjusts the received data based on the user's interests and responses.
[0686] Step 8:
[0687] The terminal presents new information and questions from the server, continuing the conversation. The terminal selects topics likely to interest the user, deepening the conversation and further retrieving the user's memory.
[0688] In this way, the system enables meaningful communication for dementia patients through interaction with the user.
[0689] (Example 1)
[0690] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0691] Communicating with dementia patients in modern society presents numerous challenges. In particular, there is a need for effective methods to elicit past memories and stabilize patients' emotions. Furthermore, it is necessary to provide patients with appropriate information while reducing the burden on caregivers.
[0692] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0693] In this invention, the server includes means for extracting age-specific information based on the user's history information, means for transmitting age-specific information to a communication device, and means for analyzing the user's response and generating the next conversation sequence using a generative AI model. This enables dementia patients to reflect on their past while engaging in interactive communication tailored to their needs.
[0694] An "information processing device" is a computer that extracts age-related information relevant to a user based on historical data and performs necessary processing.
[0695] "Information by era" refers to information such as events, photographs, and videos related to a specific period, and is data used to evoke memories in the user.
[0696] A "communication device" is a device that plays the role of sending and receiving information between an information processing device and a display device.
[0697] "Display means" refers to a device that visually presents received information to the user, and includes displays and the like.
[0698] A "sound output means" is a device that provides information in sound using speech synthesis technology.
[0699] A "response processing device" is a device that analyzes the user's voice or text responses and generates the next question based on that content.
[0700] "Speech recognition means" refers to a technology or device for converting speech input into text data.
[0701] "Text analysis technology" refers to the technology used to analyze text data and understand its meaning.
[0702] A "generative AI model" is an algorithm or technology used to generate the next conversation sequence or question based on the response content.
[0703] A "conversation sequence" is a series of questions and answers designed to facilitate smooth interaction with the user.
[0704] This invention is an information processing system that enables effective communication with dementia patients, and is realized through the server, terminal, and user each fulfilling their respective roles.
[0705] First, the server manages the user's history information. Specifically, the server uses a database management system to access a database containing the user's registration information, past activity history, and preferences. This allows the server to extract age-specific information related to the user, for example, identifying photos and videos from a specific period. This information is then transmitted to the terminal via a communication device for further processing.
[0706] The terminal presents the user with age-based information received from the server. The terminal is equipped with visual and audio output devices, which are used to provide information visually and audibly. The visual device uses a display to show images and videos in high resolution. The audio output device utilizes speech synthesis technology to provide relevant questions and explanations in natural-sounding voice. This allows the user to receive information visually and audibly, helping them recall past events.
[0707] The user can respond to information presented by the device using voice or text. The device analyzes this response using speech recognition and text analysis technology and sends the results to the server. The server uses a generative AI model to generate the next conversation sequence based on the user's response. For example, if the user says, "This photo brings back memories," the server generates a question such as, "Could you tell me more about your memories of that time?" and presents it through the device.
[0708] A concrete example would be a scenario where a user seeks information related to their childhood memories. The server provides information about school life in the 1950s, and the terminal can then use that information to ask, "What was school life like back then?" An example of a prompt to input into the generating AI model would be, "Generate questions related to school life in the 1950s. Please consider content that will elicit the user's memories and specific experiences."
[0709] In this way, servers, terminals, and users work together to enable two-way communication with dementia patients, thereby reducing the burden on caregivers.
[0710] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0711] Step 1:
[0712] The server retrieves user history information from the database. User IDs and registration information are used as input, and the output is historical data related to the user. Specifically, the server executes SQL queries to extract the necessary data and organizes it by year.
[0713] Step 2:
[0714] The server identifies chronological information based on the acquired historical data. The historical data obtained in step 1 is used as input, and chronological information such as photos and videos is generated as output. The server applies an information extraction algorithm to select the most relevant information.
[0715] Step 3:
[0716] The server sends the identified age-specific information to the terminal. The input is the age-specific information generated in step 2, and the output is the information converted into a format that can be received by the terminal. The server transmits the information using a communication protocol.
[0717] Step 4:
[0718] The terminal presents age-based information received from the server to the user via a display device and an audio output device. The input is the information transmitted in step 3, and the output is the visual and auditory information presented to the user. The terminal displays the information on a high-resolution display and provides questions and explanations in voice using a speech synthesis engine.
[0719] Step 5:
[0720] The user responds to the presented information using voice or text. The user's voice or text data is sent to the device as input. Specifically, the user provides feedback through voice commands or messages.
[0721] Step 6:
[0722] The device analyzes the user's response using speech recognition and text analysis technologies. The input is the user's response in step 5, and the output is the analyzed response. The device converts the speech to text using speech recognition software and analyzes its meaning using a natural language processing algorithm.
[0723] Step 7:
[0724] The server uses a generative AI model based on the analyzed response to generate the next conversation sequence. The input is the analyzed data obtained in step 6, and the output is the conversation or questions presented next. The server passes a prompt to the generative AI model, which generates a new question.
[0725] Step 8:
[0726] The server sends the generated question to the terminal, which then presents it to the user. The input is the question generated in step 7, and the output is the new information shown to the user. The terminal then presents the information to the user again through the display device and sound output device.
[0727] (Application Example 1)
[0728] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0729] In communicating with dementia patients, it is necessary to support them in reflecting on the past, stabilize their emotions through conversation, and provide effective means to promote memory recall. Furthermore, a challenge is to adjust the dialogue based on the patient's interests and understanding to achieve communication that is appropriate for them.
[0730] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0731] In this invention, the server includes an information processing device that extracts age-specific information based on the user's history information, a means for transmitting the age-specific information to a display means, a display means for presenting the received age-specific information to the user, a response processing device that analyzes the user's response and generates the next question based on the response content, a means for conducting a dialogue based on the presented information using speech synthesis technology, a means for analyzing the user's voice input using speech recognition technology, and a means for creating prompt sentences using a generation AI model. This makes it possible to realize a system that allows dementia patients to reflect on their past, achieve emotional stability, and enable communication tailored to each individual patient.
[0732] An "information processing device" is a device that extracts age-based information from a user's history and transmits that information to other devices.
[0733] "Information by era" refers to a collection of information related to a specific period, extracted based on the user's history.
[0734] A "display means" is a device for presenting age-based information received from an information processing device to the user.
[0735] A "response processing device" is a device that analyzes responses from users and generates the next question based on the content of those responses.
[0736] "Speech synthesis technology" is a technology that enables conversations based on presented information to be conducted using natural-sounding speech.
[0737] "Speech recognition technology" is a technology that converts voice input from users into text data and analyzes its content.
[0738] A "generative AI model" is a form of artificial intelligence used to create prompt statements in response processing.
[0739] A "prompt sentence" is a sentence created using a generative AI model to guide the user into the next conversation.
[0740] At the heart of the system implementing this invention is an information processing device. This device is responsible for extracting age-specific information based on the user's history and transmitting the relevant information to a terminal. The terminal is equipped with a display device and a speech synthesis device, which presents the age-specific information to the user visually and audibly.
[0741] The server is a computer that manages user responses and analyzes their content. This computer uses speech recognition technology to convert the user's voice input into text and then analyzes that text. It then uses a generative AI model to create prompt sentences and determine the next information or question to present.
[0742] For example, if a user says, "I miss school life in the 1950s," the server sends school-related photos from that era to the user's device and uses speech synthesis technology to generate and present a voice message saying, "Could you tell me a little more about school life back then?"
[0743] The hardware used includes server computers and smartphones or tablets capable of display and speech synthesis. The software utilizes Google Speech Recognition for speech recognition, Google Text-to-Speech for speech synthesis, and SQLite for data management. Software for the generative AI model also runs on the server.
[0744] As a concrete example, here is an example of a prompt input to a generative AI model: "The user feels nostalgic for a park from the 1950s. Please generate the following related question." This prompt allows the generative AI model to generate subsequent questions and information for the user, adjusting the conversation to flow smoothly.
[0745] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0746] Step 1:
[0747] The server retrieves user history information from the database. The user's ID or profile information is used as input. Relevant age-specific information is extracted through database queries, and a list of the extracted information is generated as output.
[0748] Step 2:
[0749] The server sends the extracted age-based information to the terminal. The input here is the age-based information obtained in step 1, and the output is the transfer of the data to the terminal. Once the terminal receives the information, it prepares it for visual display.
[0750] Step 3:
[0751] The device presents age-specific information to the user visually and audibly. The input is age-specific information from a server. The device displays the information on the screen and uses Google Text-to-Speech to provide related questions audibly. The output is the user visually confirming the information and listening to the questions.
[0752] Step 4:
[0753] The user sees or hears the presented information and responds in voice or text. The input is a response based on the user's own memory and experience. The output is a response provided to the device in the form of voice or text.
[0754] Step 5:
[0755] The device converts user responses into text using speech recognition technology. The input is the user's voice, and the data is converted to text using Google Speech Recognition. The output is parseable text.
[0756] Step 6:
[0757] The server analyzes the transcribed response and constructs a prompt using a generative AI model. The input here is the text obtained from step 5. Data analysis determines the user's interests and topics, and then creates a prompt. The output is a prompt containing the next question or information to be presented.
[0758] Step 7:
[0759] The server sends the next question to the terminal based on the generated prompt. The input is the prompt constructed in step 6. The terminal presents this information to the user again and continues the conversation. The output is the provision of new dialogue content.
[0760] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0761] This invention is an information processing system that facilitates communication with dementia patients and enables interaction that takes into account the user's emotions. This system consists of an information processing device, a display device, a response processing device, and an emotion engine. The specific functions and operations of each component are described below.
[0762] First, the server uses the user's history information to extract age-based data. This data includes images and videos based on the user's past experiences and interests, as well as related historical data. The server then sends the extracted information to the terminal.
[0763] The device presents the user with age-specific information received from the server. Specifically, it displays images and videos while simultaneously reading the relevant information aloud using speech synthesis technology. At this stage, the device works in conjunction with an emotion engine to collect the user's facial expressions and voice tone through the camera and microphone.
[0764] The user responds to the information presented by the device. This response is sent to the device as text or voice input. The device receives this and performs analysis in the response processing unit. In the analysis process, emphasis is placed not only on the user's response but also on the collected sentiment data.
[0765] The emotion engine determines the user's emotional state based on user input and collected emotional data. For example, if the user makes a nostalgic expression, the emotion engine will determine this to be a positive emotion. The server receives feedback from the emotion engine and adjusts the next questions and information presentation accordingly.
[0766] As a concrete example, consider a scenario where the user feels nostalgic about school life in the 1950s, as presented. If the user shows a satisfied expression, the emotion engine recognizes this as a positive response. Based on this, the server assumes the user wants to continue the conversation and generates a question such as, "Please tell me about your memories with friends from that time." The terminal then presents this question to the user, allowing for a smooth continuation of the dialogue.
[0767] In this way, the present invention realizes interaction that reflects the user's emotions in real time, providing more personalized care. Through this system, caregivers can maintain the mental health of dementia patients and reduce the burden of daily care.
[0768] The following describes the processing flow.
[0769] Step 1:
[0770] The server extracts relevant, age-specific information from the database based on the user's history. This information includes images, videos, and audio files that are likely to be of interest to the user.
[0771] Step 2:
[0772] The server packages the extracted age-based information and related supplementary data and sends it to the terminal. The terminal then prepares to effectively present the received information to the user.
[0773] Step 3:
[0774] The terminal displays information received from the server to the user and simultaneously provides information via audio. It is optimized so that the user can receive this information both visually and aurally.
[0775] Step 4:
[0776] The device activates an emotion engine and acquires data from the camera and microphone to monitor the user's facial expressions and tone of voice. This data is used to recognize the user's emotional state in real time.
[0777] Step 5:
[0778] The user reacts to the presented content and provides verbal responses. The user's responses are sent to the device via voice input.
[0779] Step 6:
[0780] The terminal converts the user's voice into text, which is then analyzed by a response processing unit. Simultaneously, the emotion engine evaluates the user's emotions and determines whether they are positive, negative, or neutral.
[0781] Step 7:
[0782] The server receives the analysis results and sentiment evaluation, and generates information and questions for the next step. The generated content is optimized for the user's interests and emotional state.
[0783] Step 8:
[0784] The terminal presents the user with new information and questions obtained from the server, continuing the conversation. The terminal strives to help the user relax and continue talking by providing information at the appropriate time and in the right tone.
[0785] Through the above process, the system makes interactions with users more natural and emotionally sensitive, providing an effective and comfortable communication environment.
[0786] (Example 2)
[0787] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0788] A challenge in interacting with dementia patients is the lack of interactive information provision that takes into account the user's emotions. Conventional systems have difficulty adjusting conversations to reflect the user's emotional state in real time, resulting in limited provision of individualized care.
[0789] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0790] In this invention, the server includes means for an information processing device to extract age-specific data based on the user's history data, means for a terminal to collect the user's facial expressions and voice tone through a camera and microphone, and means for an emotion engine to determine the user's emotional state and adjust the conversation accordingly. This enables smooth dialogue that takes the user's emotional state into consideration in real time.
[0791] An "information processing device" is a device that analyzes data and extracts and generates necessary information based on a specific purpose.
[0792] "Age-based data" refers to data that includes images, videos, and historical information related to a specific period in the past, based on the user's history and interests.
[0793] A "display device" is a device that presents received information to a user visually or audibly.
[0794] "Speech synthesis technology" is a technology that converts text data into speech and reads it aloud.
[0795] A "terminal" is a device that collects user input and emotional states and presents that information.
[0796] A "camera" is an optical device that captures images or videos and transmits them to a display or other device.
[0797] A "microphone" is a device that receives sound and converts it into an electrical signal.
[0798] An "emotion engine" refers to technology that analyzes a user's facial expressions and voice tone to determine their emotional state.
[0799] A "response processing device" is a device used to analyze user responses and generate the content of the next dialogue.
[0800] A "generative AI model" is an artificial intelligence technology that generates new text or questions based on input prompts.
[0801] This invention is an information processing system that facilitates communication with dementia patients and enables interaction that takes into account the user's emotions. The system mainly consists of a server, a terminal, a display device, a response processing device, and an emotion engine.
[0802] The server uses a database to analyze user history data and extract data by age group. This information includes images, videos, and historical data related to the user's past experiences and interests. The server uses a database management system.
[0803] When the terminal receives age-specific data from the server, it visually presents images and videos using a display device and reads out related information using speech synthesis technology. A general-purpose speech synthesis engine is used for this process. The terminal also records the user's facial expressions and voice tone using a camera and microphone, and an emotion engine analyzes this data.
[0804] The user can respond to the presented information, and this response is sent to the terminal as text or voice input. The terminal immediately sends this input data to the response processing unit.
[0805] The emotion engine analyzes the user's facial expressions and tone of voice to evaluate their emotional state. If the user smiles or displays a satisfied expression, it is recognized as a positive emotion. Based on this evaluation, the server uses a generative AI model to generate the next dialogue, taking the user's emotional state into consideration.
[0806] As a concrete example, consider a case where a user makes a nostalgic expression when presented with data about school life in the 1950s. The emotion engine analyzes this as a positive reaction and provides the result to the server. The server uses a generative AI model and prompt text to generate the next question, such as "Please tell us about your memories with friends from that time," and presents it to the user through the terminal.
[0807] An example of a prompt message is: "Generate a new question based on the user's interests."
[0808] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0809] Step 1:
[0810] The server retrieves user history data from the database. The input is the user's ID, and the output is extracted historical data related to past experiences and interests. Database management software is used to process the historical data and sort it into age-specific data. Specifically, SQL queries are used to extract data categorized by age.
[0811] Step 2:
[0812] The server sends the extracted age-based data to the terminal. The input is the extracted age-based data, and the output is the transmission of data to the terminal. Network communication protocols are used to accurately transmit the data to the terminal.
[0813] Step 3:
[0814] The terminal analyzes the received age-specific data, displays images and videos on a display device, and reads the information aloud using speech synthesis technology. The input is age-specific data received from the server, and the output is a visual and auditory presentation to the user. A speech synthesis engine is used to convert text into speech, which is then presented using the display and speaker.
[0815] Step 4:
[0816] The user reacts to the information presented. Input is the information presented by the device, and output is the information sent to the device as the user's facial expressions, voice responses, or text input.
[0817] Step 5:
[0818] The device collects user responses through its camera and microphone. Input is the user's facial expressions and voice, and output is the transmission of this data to a response processing unit. Image analysis and speech recognition technologies are used to structure the collected data and send it to the emotion engine.
[0819] Step 6:
[0820] The emotion engine analyzes the user's facial expressions and voice tone to determine their emotional state. The input is the user's emotional data, and the output is an evaluation of that emotional state. An analysis algorithm is used to classify emotions as positive, negative, or neutral.
[0821] Step 7:
[0822] The server uses a generative AI model to generate the next dialogue based on feedback from the emotion engine. The input consists of the emotion engine's evaluation results, the current conversation context, and prompts for the generative AI model; the output is the next question or information to be presented. The generative AI model uses natural language generation techniques to create the new dialogue.
[0823] Step 8:
[0824] The terminal presents the user with new dialogue content from the server. The input is the next question or information generated by the server, and the output is the visual or auditory presentation of this information to the user. This iterative process ensures a smooth and continuous dialogue with the user.
[0825] (Application Example 2)
[0826] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0827] In interacting with dementia patients, the challenge lies in achieving smooth interactions that reflect the user's emotions in real time, and providing individualized care based on the user's past experiences and interests. Furthermore, there is a need to build a system that reduces the burden on caregivers and makes daily care more effective.
[0828] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0829] In this invention, the server includes an information processing device that extracts age-specific information based on the user's history information, a means for transmitting the age-specific information to a display device, and a means for the display device to present the received age-specific information to the user. This makes it possible to provide personalized care through interaction that takes into account the user's emotions, thereby reducing the burden on caregivers.
[0830] An "information processing device" is an electronic device that extracts age-based information based on the user's history and transmits it to a display device.
[0831] "Information by age group" refers to data organized by age group based on users' past experiences and interests.
[0832] A "display device" is a device that visually presents age-based information received from an information processing device to the user.
[0833] A "response processing device" is a device that analyzes responses from users, generates the next question, and adjusts the conversation accordingly.
[0834] An "emotion engine" is software or hardware that collects a user's facial expressions or voice tone and determines their emotional state.
[0835] "Emotional feedback" refers to information used to reflect the results of the emotional engine's assessment of the emotional state in the interaction.
[0836] "Means of adjusting interaction" refer to a system that appropriately modifies conversations and information presentations based on emotional feedback from an emotion engine.
[0837] A "smart device" is a portable electronic device used to improve visual or auditory interaction, and it has the ability to run applications.
[0838] This invention is a system for facilitating smooth communication with dementia patients. The system primarily consists of a server, a terminal, and user interaction. The server is responsible for extracting age-specific information based on the user's history and transmitting it to the terminal. Age-specific information is managed in the form of images, videos, and related historical data.
[0839] The terminal presents age-specific information received from the server to the user via a display device. Specifically, it displays images and videos on the screen while simultaneously using speech synthesis technology to read the information aloud and provide it to the user as audio information. The terminal also works in conjunction with an emotion engine to collect the user's facial expressions and voice tone through the camera and microphone. This information is used to understand the user's emotional state. Based on the collected data, the emotion engine analyzes emotions in real time and determines positive and negative responses.
[0840] The user responds to the presented information via voice, and this response is analyzed on the device. Based on the analysis results and emotional feedback from the emotion engine, the server adjusts the next questions and information presentation. For example, if the user smiles, the server can generate questions that evoke positive memories, such as, "Do you remember the first place you traveled to?"
[0841] In implementing the system of this invention, hardware such as smart glasses (e.g., a general portable visual device) or a tablet (e.g., a general personal information terminal) can be used. Software such as a face capture library (e.g., OpenCV), a speech recognition API (e.g., Google Cloud Speech-to-Text), and an emotion analysis engine (e.g., Affectiva) can be applied. With such a system configuration, it becomes possible to maintain the mental health of dementia patients through individualized care while reducing the burden on caregivers.
[0842] As an example of a prompt message, it can be used in server design in the form of, "When the user smiles, ask them verbally, 'Do you remember the first place you traveled to?'"
[0843] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0844] Step 1:
[0845] The server extracts age-specific information based on the user's history. The input is the user's history data, which is used to search for age-specific information based on past experiences and interests, and select relevant information from the database. The output is age-specific information data.
[0846] Step 2:
[0847] The server sends the extracted age-based information to the terminal. The input is the age-based information data generated in step 1, which is sent to the terminal via the network protocol. The output is the age-based information data received by the terminal.
[0848] Step 3:
[0849] The terminal presents age-based information received to the user via a display device. The input is age-based information data transmitted from the server, which is converted into image or video formats for display on the screen. Furthermore, a speech synthesis API is used to convert the visual information into speech. The output is audiovisual information for the user to view.
[0850] Step 4:
[0851] The device uses a camera and microphone to collect the user's facial expressions and voice tone in real time. The input is the user's audiovisual reactions, which are acquired as camera video and microphone audio data. The output is emotion data that is processed in real time.
[0852] Step 5:
[0853] The emotion engine determines the user's emotional state based on collected facial expressions and voice tone. The input is emotional data supplied from the device, which is analyzed using a machine learning algorithm. The output is the user's emotional state as determined by the emotion engine.
[0854] Step 6:
[0855] The response processing unit analyzes the user's response and generates the next question, taking into account the emotional state from the emotion engine. The input consists of response data and emotional state information provided by the user, either as speech or text, which are analyzed using natural language processing and an AI model. The output is the data for the next question presented.
[0856] Step 7:
[0857] The server sends the generated question data to the terminal and adjusts the interaction. The input is the question data generated in step 6, which is sent via network communication to the terminal. The output is a new question presented to the user.
[0858] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0859] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0860] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0861] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0862] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0863] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0864] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0865] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0866] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0867] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0868] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0869] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0870] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0871] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0872] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0873] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0874] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0875] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0876] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0877] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0878] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0879] The following is further disclosed regarding the embodiments described above.
[0880] (Claim 1)
[0881] The information processing device has a means for extracting age-specific information based on the user's history information,
[0882] The aforementioned information processing device includes means for transmitting age-specific information to a display device,
[0883] The display device provides a means for presenting the received age-based information to the user,
[0884] The response processing device includes means for analyzing the response from the user and generating the next question based on the content of the response,
[0885] In the aforementioned question generation, a means of adjusting the conversation while considering the user's interests and understanding,
[0886] A system that includes this.
[0887] (Claim 2)
[0888] The system according to claim 1, wherein the information processing device is equipped with means for accumulating the results of response analysis and learning the trends of each user.
[0889] (Claim 3)
[0890] The system according to claim 1, wherein the response processing device is equipped with means for responding to both voice input and text input from a user.
[0891] "Example 1"
[0892] (Claim 1)
[0893] The information processing device has a means for extracting age-specific information based on the user's history information,
[0894] The aforementioned information processing device includes means for transmitting chronological information to a communication device,
[0895] The display means is a means of presenting the received age-based information to the user,
[0896] The sound output means is a means of providing information in voice using speech synthesis technology,
[0897] The response processing device includes means for analyzing the response from the user and generating the next question based on the content of the response,
[0898] In the aforementioned question generation, a means of adjusting the conversation while considering the user's interests and understanding,
[0899] A means for analyzing user responses using speech recognition and text analysis technology,
[0900] A means for generating the next conversation sequence using a generative AI model,
[0901] A system that includes this.
[0902] (Claim 2)
[0903] The system according to claim 1, wherein the information processing device is equipped with means for accumulating the results of response analysis and learning the trends of each user.
[0904] (Claim 3)
[0905] The system according to claim 1, wherein the response processing device is equipped with means for responding to both voice input and text input from a user.
[0906] "Application Example 1"
[0907] (Claim 1)
[0908] The information processing device has a means for extracting age-specific information based on the user's history information,
[0909] The aforementioned information processing device includes means for transmitting age-specific information to a display means,
[0910] The display means is a means of presenting the received age-based information to the user,
[0911] The response processing device includes means for analyzing the response from the user and generating the next question based on the content of the response,
[0912] In the aforementioned question generation, a means of adjusting the conversation while considering the user's interests and understanding,
[0913] A means of conducting a dialogue based on the information presented using speech synthesis technology,
[0914] A means of analyzing voice input from users using speech recognition technology,
[0915] A system that includes this.
[0916] (Claim 2)
[0917] The system according to claim 1, wherein the information processing device is equipped with means for accumulating the results of response analysis and learning the trends of each user.
[0918] (Claim 3)
[0919] The system according to claim 1, wherein the response processing device includes means for creating a prompt sentence using a generation AI model.
[0920] "Example 2 of combining an emotion engine"
[0921] (Claim 1)
[0922] The information processing device provides a means for extracting age-specific data based on the user's history data,
[0923] The aforementioned information processing device includes means for transmitting age-specific data to a display device,
[0924] The display device presents the received age-based data to the user and reads the information aloud using speech synthesis technology.
[0925] The device has means of collecting the user's facial expressions and voice tone through the camera and microphone,
[0926] The response processing device includes means for analyzing the user's response and generating the next question based on the response content and collected sentiment data,
[0927] In the aforementioned question generation, the emotion engine determines the user's emotional state and adjusts the conversation accordingly.
[0928] A system that includes this.
[0929] (Claim 2)
[0930] The system according to claim 1, wherein the information processing device is equipped with means for accumulating response analysis results and sentiment data and learning the tendencies of each user.
[0931] (Claim 3)
[0932] The system according to claim 1, wherein the response processing device is equipped with means for responding to both voice input and text input from a user.
[0933] "Application example 2 when combining with an emotional engine"
[0934] (Claim 1)
[0935] The information processing device has a means for extracting age-specific information based on the user's history information,
[0936] The aforementioned information processing device includes means for transmitting age-specific information to a display device,
[0937] The display device provides a means for presenting the received age-based information to the user,
[0938] The response processing device includes means for analyzing the response from the user and generating the next question based on the content of the response,
[0939] In the aforementioned question generation, a means of adjusting the conversation while considering the user's interests and understanding,
[0940] The emotion engine collects the user's facial expressions or voice tone and has means to determine their emotional state.
[0941] Means for adjusting the interaction based on emotional feedback from the aforementioned emotion engine,
[0942] Means of using smart devices to improve visual or auditory interaction,
[0943] A system that includes this.
[0944] (Claim 2)
[0945] The system according to claim 1, wherein the information processing device is equipped with means for accumulating the results of response analysis and learning the trends of each user.
[0946] (Claim 3)
[0947] The system according to claim 1, wherein the response processing device is equipped with means for responding to both voice input and text input from a user. [Explanation of Symbols]
[0948] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. The information processing device has a means for extracting age-specific information based on the user's history information, The aforementioned information processing device includes means for transmitting age-specific information to a display means, The display means is a means of presenting the received age-based information to the user, The response processing device includes means for analyzing the response from the user and generating the next question based on the content of the response, In the aforementioned question generation, a means of adjusting the conversation while considering the user's interests and understanding, A means of conducting a dialogue based on the information presented using speech synthesis technology, A means of analyzing voice input from users using speech recognition technology, A system that includes this.
2. The system according to claim 1, wherein the information processing device is equipped with means for accumulating the results of response analysis and learning the trends of each user.
3. The system according to claim 1, wherein the response processing device includes means for creating a prompt sentence using a generation AI model.