system

The system addresses the challenge of providing personalized advice by collecting and analyzing user data with a generative AI model, offering tailored support through natural language processing, enhancing the quality of life for individuals with diverse lifestyles.

JP2026100630APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Conventional support systems fail to provide personalized advice tailored to individual family compositions and diverse lifestyles, leading to inadequate support in areas like health, economy, and education.

Method used

A system that collects personal information using voice and sensors, analyzes it with a generative AI model, and provides tailored advice in a user-friendly format using natural language processing technology.

Benefits of technology

Enables flexible and advanced support tailored to diverse family structures and lifestyles, improving the quality of life by providing personalized advice in real time.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100630000001_ABST
    Figure 2026100630000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] Means of collecting personal information, A means of performing analysis based on the personal information and generating advice based on individual needs, Means for providing such advice to the user, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of the chatbot's character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern society, there is a problem that it is difficult to provide appropriate support according to the values and living environments of individual families and individuals. In particular, for people with different family compositions and diverse lifestyles, sufficient countermeasures have not been taken by conventional fixed support means. As a result, necessary support regarding an individual's health, economy, and education is not provided, and many users feel inconvenienced.

Means for Solving the Problems

[0005] This invention provides a system that collects personal information, performs analysis based on that information, and generates and provides advice tailored to the individual needs of users. This system acquires everyday personal information using voice and sensors, and analyzes the data using a generation AI model. The resulting advice is presented in a user-friendly format using natural language processing technology, providing personalized support. This enables flexible and advanced support tailored to diverse family structures and lifestyles.

[0006] "Personal information" refers to data that includes an individual's specific living situation, behavior, preferences, etc., and is information collected in connection with that individual.

[0007] "Analysis" is the process of analyzing collected data to identify individual behavioral patterns and needs.

[0008] "Advice" refers to specific suggestions and guidance tailored to individual needs, derived from the analysis results.

[0009] "Providing" means presenting the generated advice to the user and making it available for use.

[0010] "Voice and sensor" refers to input / output devices for collecting data from users, and is a group of devices that include functions for voice recognition and detection of environmental changes.

[0011] A "generative AI model" is an algorithm or software that uses machine learning techniques to analyze personal data and automatically generate optimal advice.

[0012] "Natural language processing technology" refers to language processing techniques used to convey generated advice to the user in a natural way. [Brief explanation of the drawing]

[0013] [Figure 1]This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, when an emotion engine is combined. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0014] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0019] In the following embodiments, the numbered communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] This invention is a system that provides appropriate advice according to an individual's lifestyle and needs. This system consists of three main elements: a terminal, a server, and the user.

[0035] First, the device functions as an interface with the user and is installed in the user's living environment. The device incorporates voice recognition sensors and other sensors to collect the user's daily actions and voice commands. This allows the device to acquire basic data in real time to understand the user's lifestyle patterns and needs.

[0036] Next, data is sent from the terminal to the server. The server converts the received data into a format suitable for analysis and applies a generating AI model. Based on the collected data, this AI model analyzes the characteristics of the user's behavior patterns and generates advice that the user is likely to need. For example, if the user is seeking health improvement, the AI ​​model analyzes past eating habits and exercise history and suggests healthy meal menus and exercise plans.

[0037] The generated advice is converted into a user-friendly format using natural language processing technology. The server then sends the converted advice to the terminal.

[0038] The device provides the user with the advice it receives. The user receives advice from the device via voice and display, and can ask additional questions as needed. This user-friendly interface allows users to receive support tailored to their needs at any time.

[0039] For example, if a user says, "I want to save on my electricity bill this month," the device sends this request to the server, which analyzes the historical data and generates advice on specific energy-saving methods. This kind of dynamic support can improve the user's quality of life.

[0040] The following describes the processing flow.

[0041] Step 1:

[0042] The device collects the user's daily voice commands and actions. It utilizes a voice recognition sensor to convert the user's words into text data. It also acquires environmental data such as temperature, brightness, and movement through built-in sensors and temporarily stores it in local storage.

[0043] Step 2:

[0044] The terminal encrypts the various data it collects to ensure security and then sends it to the server. Transmission takes place via a secure network protocol. This process also requires confirmation of successful data transmission.

[0045] Step 3:

[0046] The server formats the received data for analysis and temporarily stores it in the database. Data cleaning is performed to correct incomplete data and outliers, preparing the data for improved analysis accuracy.

[0047] Step 4:

[0048] The server applies a generative AI model and performs analysis. Based on historical data, it analyzes user behavior patterns and identifies individual needs. From these results, it extracts advice that should be provided to the user.

[0049] Step 5:

[0050] The server uses natural language processing techniques to convert the generated advice into a format that is easy for the user to understand. This makes the advice easier for the user to accept.

[0051] Step 6:

[0052] The server sends the converted advice to the terminal. After sending, it waits for confirmation of receipt from the terminal and immediately resends if necessary.

[0053] Step 7:

[0054] The device provides the user with the received advice via voice output or display. The user can then ask further questions to continue the interaction and prepare to send any necessary feedback information back to the server.

[0055] (Example 1)

[0056] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0057] There is a growing need to provide timely and accurate advice tailored to individual lifestyles and needs. However, conventional systems have faced challenges in accurately understanding user behavior patterns and providing personalized advice based on them.

[0058] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0059] In this invention, the server includes means for collecting data, means for performing analysis based on the data and using a generative model to generate suggestions based on individual needs, and means for converting the generated suggestions into a format understandable to the user using language processing technology and providing them. This makes it possible to provide highly accurate advice in real time that is tailored to the user's behavior patterns.

[0060] "Data" refers to information about user behavior and the environment, which is acquired through sensors and voice recognition.

[0061] A "generative model" refers to an algorithm or technology that generates appropriate suggestions and advice based on a user's past behavior and needs.

[0062] "Language processing technology" is the technology that converts generated suggestions and advice into a natural language format that users can easily understand.

[0063] "Suggestions" refer to advice and instructions generated in a way that addresses the individual needs of the user, and are created using a generative model.

[0064] "Means" refers to the methods or devices used to achieve a specific objective.

[0065] This system provides advice tailored to the user's lifestyle and primarily consists of a terminal and a server. The terminal is installed in the user's living environment and is equipped with a voice recognition sensor and an accelerometer. This terminal records the user's voice commands and daily activities in real time and collects this data. Based on this collected data, the server performs analysis and generates advice based on individual needs using an AI model.

[0066] The generated advice is converted into a user-friendly language format by the server using natural language processing technology. The converted advice is then delivered to the user via the terminal, and the user can receive it visually or audibly. Furthermore, the user can ask additional questions to the terminal.

[0067] For example, if a user speaks to their device saying, "I want to save on my electricity bill this month," the device sends this request to a server. The server analyzes past electricity usage data and uses a generated AI model to suggest specific ways to save energy. Through this process, users can receive useful advice that can lead to improvements in their daily lives.

[0068] Examples of prompt statements include specific questions such as, "What advice can you offer the user if they want to live a healthy lifestyle?" These types of prompts form the basis for the system to provide appropriate information tailored to the user's needs.

[0069] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0070] Step 1:

[0071] The device collects user voice commands and daily activities through sensors. Input consists of real-time user voice commands and environmental activity data. Based on this data, the device generates foundational data to identify user behavior patterns and needs. Specifically, it uses speech recognition technology to convert voice into text data and records environmental sensor data with timestamps.

[0072] Step 2:

[0073] The terminal sends the collected data to the server. The input consists of the voice-text data and sensor data generated in step 1. The server processes this data, formatting it into a specific format and converting it into a format suitable for analysis. Through this data processing, the server prepares the data for analysis and inference.

[0074] Step 3:

[0075] The server uses a generative AI model to analyze the formatted data. The input is the user data transformed in step 2. The generative AI model analyzes the user's past behavioral patterns and generates specific advice based on their needs. For example, if the user's dietary data is input, it will create healthy meal suggestions.

[0076] Step 4:

[0077] The server transforms the generated advice using natural language processing technology. The input is the advice generated in step 3. This transformation makes the advice easier for the user to understand. The server translates it into plain language and organizes the advice into a format that is intuitively meaningful to the user.

[0078] Step 5:

[0079] The server sends the converted advice to the terminal, which then provides it to the user. The input is the advice converted in step 4. As output, the terminal presents the final advice to the user via voice or display. The user receives this advice and can ask the terminal additional questions if necessary. For example, the user can use the voice command "be more specific" to obtain more detailed information.

[0080] (Application Example 1)

[0081] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0082] In recent years, urban life has become increasingly complex, and there is a growing need to provide information and support tailored to the individual lifestyles of each citizen quickly and appropriately. However, existing information provision systems have struggled to flexibly provide advice that meets individual needs. Therefore, this invention aims to solve the problem of providing personalized advice in real time that is tailored to each individual's lifestyle, in order to improve the quality of life for residents in smart cities.

[0083] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0084] In this invention, the server includes means for collecting personal information, means for performing analysis based on said personal information and generating advice based on individual needs, and means for providing said advice to the user using a visual or auditory interface. This makes it possible for the user to quickly receive appropriate advice tailored to their desired environment and circumstances.

[0085] "Personal information" refers to information that can identify a specific individual, including data such as name, address, and behavioral patterns.

[0086] "Analysis" is the process of deriving patterns and trends from collected information and making decisions according to a specific purpose.

[0087] "Advice" refers to suggestions or instructions provided in response to specific situations or needs, and is information that can be used to help the recipient take action.

[0088] A "visual interface" is a means of presenting information to a user visually through displays, projections, and other means.

[0089] An "auditory interface" is a means of conveying information to a user audibly through sound or audio.

[0090] A "detector" is a device that senses environmental data such as sound and physical movement and converts it into an electronic signal.

[0091] To realize this invention, a system consisting of three main elements—a terminal, a server, and a user—is used. The terminal is installed in the user's environment and collects personal information using sensors such as cameras and microphones. Head-mounted displays and smart glasses are often used as terminals. This allows the user's daily activities and voice commands to be obtained in real time, and this information is transmitted to the server as digital data.

[0092] The server processes the received personal information and performs analysis using generative AI models. These AI models utilize Amazon Web Services' AI technology and OpenAI® generative models. The analysis performed on the server is based on the user's past behavioral patterns and generates advice tailored to the user's specific needs. For example, health management advice and public transportation information are generated through this process.

[0093] The advice derived from the analysis is converted into a user-friendly format using natural language processing technology. It is then communicated to the user via a visual or auditory interface through the device. Google® Cloud Speech-to-Text API is used for voice control, and visual information is presented on the smart glasses' display.

[0094] As a concrete example, let's assume a user gives a command such as "Tell me what exercise I should do today" while they are out. In this case, the system will refer to the user's past exercise data and weather information to generate advice such as "Today's recommendation is a 20-minute brisk walk in the park," and provide it through visual display and voice response.

[0095] Examples of prompts for a generative AI model include the following:

[0096] "User voice command: 'I'd like some advice on my schedule for this week.'"

[0097] "Prompt to the AI ​​model: 'The weather will worsen on the candidate dates this week, so what relaxing indoor activities can be done?'"

[0098] This system makes it possible to provide detailed support for the quality of life of residents in smart cities and deliver personalized advice in a timely manner.

[0099] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0100] Step 1:

[0101] The device uses a camera and microphone to collect user voice commands and behavioral data. It takes user voice commands and action information as input and prepares to send this as digital data to the server. For example, if a user says, "Tell me the weather this week," that voice will be captured.

[0102] Step 2:

[0103] The server receives digital data sent from the terminal and converts the audio data into text using speech recognition technology. At this stage, the Google Cloud Speech-to-Text API is used to convert the voice instructions into a parseable format. This converted text becomes the input and is sent to the next data analysis step.

[0104] Step 3:

[0105] The server receives voice instructions in text format and inputs them into a generating AI model. The AI ​​model analyzes the instructions, compares them with past user data, and generates necessary advice. As part of the data processing, it refers to weather information and schedule data to generate corresponding suggestions. This generated advice becomes the output.

[0106] Step 4:

[0107] The server uses natural language processing techniques to convert the outputted advice into language that is easy for the user to understand. For example, the generated advice might become, "It's cloudy today, so we recommend doing yoga indoors." This natural language advice becomes the final output.

[0108] Step 5:

[0109] The device provides users with natural language advice retrieved from a server via voice or display. Specifically, advice is displayed visually on the smart glasses' screen and audibly through a speaker. This process allows users to receive specific and personalized information in real time.

[0110] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0111] This invention is a system that combines an emotion engine to recognize user emotions, and aims to provide users with more appropriate support. This system consists of three elements: a terminal, a server, and the user.

[0112] The device functions as a gateway to information in the user's daily life. It is equipped with a voice recognition sensor, a camera sensor for emotion recognition, and other environmental sensors. In addition to capturing voice and behavior in real time, it has a built-in emotion engine that can analyze the user's voice tone and facial expressions to infer their emotional state.

[0113] Data regarding the user's emotions and behavior is sent from the device to the server. The server receives this data and organizes and analyzes it. The server uses a generative AI model to perform a comprehensive data analysis that includes the user's past behavioral patterns and recognized emotional states. This generates optimal advice tailored to the user's current condition. By including emotional analysis in this process, more personalized support is achieved that is not simply based on behavioral patterns, but rather on the user's current psychological state.

[0114] The generated advice is transformed into a form easily understood by the user using natural language processing technology. Furthermore, appropriate tone and phrasing are added to the advice, taking into account the user's emotional state. This transformed advice is then returned to the device and presented to the user from the device.

[0115] For example, if the emotion engine detects that a user is experiencing stress, the server generates advice, including relaxation techniques, and presents it in a gentle tone. In this way, the system, which combines the emotion engine with analysis, can provide users with more appropriate and situation-specific support than ever before.

[0116] The following describes the processing flow.

[0117] Step 1:

[0118] The device begins collecting the user's everyday voice and facial expression data. A voice recognition sensor converts spoken words into text, and a camera sensor equipped with an emotion engine analyzes emotions from facial expressions.

[0119] Step 2:

[0120] The device organizes the collected voice data, text data, sentiment data, and other environmental sensor data, encrypts and secures it, and then sends it to the server. This transmission uses a secure protocol to protect privacy.

[0121] Step 3:

[0122] The server saves the received data to a database and performs data cleaning. By removing noise and filling in missing parts, it prepares the data for more accurate analysis.

[0123] Step 4:

[0124] The server applies a generative AI model to comprehensively analyze the user's behavior patterns, current emotional state, and past history. This analysis identifies the user's needs and areas where personalized support is required.

[0125] Step 5:

[0126] Based on the analysis results, the server generates optimal advice that takes into account individual needs and emotional states. The advice reflects the appropriate tone and expression corresponding to the user's current emotions and is transformed into a form that is easy for the user to understand using natural language processing technology.

[0127] Step 6:

[0128] The server sends the translated advice to the terminal. Once the data transmission to the terminal is complete, the server confirms success and resends the data if necessary.

[0129] Step 7:

[0130] The terminal presents the received advice to the user via voice and display. The user can provide feedback on the information provided, and this feedback is prepared to be sent to the server to help with future analysis.

[0131] (Example 2)

[0132] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0133] Traditional data analysis systems have the drawback of only being able to provide general advice based on user behavior patterns, lacking personalized support tailored to the emotional state of individual users. Furthermore, there is room for improvement in effectively integrating diverse data sources and providing easy-to-understand advice in natural language.

[0134] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0135] In this invention, the server includes means for collecting information, means for analyzing the information and generating advice based on individual needs that take into account behavioral patterns and emotional states, and means for converting the generated advice into a format understandable to the user using natural language processing technology and providing it. This makes it possible to provide personalized advice that is tailored to each user's emotional state.

[0136] "Information" refers to all data acquired through audio, video, and sensors, and includes various data related to the user's behavior and emotional state.

[0137] "Analysis" is the process of processing and analyzing collected information to understand user behavior patterns and emotional states.

[0138] "Behavioral patterns" refer to data that shows trends resulting from an analysis of a user's past behavioral history and habits.

[0139] "Emotional state" refers to the user's subjective mental condition, inferred from data such as voice and facial expressions.

[0140] "Advice" refers to specific instructions or suggestions provided to the user based on the analysis results.

[0141] "Generation" refers to the process of creating new advice or suggestions based on information and analysis results.

[0142] "Natural language processing technology" is a technical means of converting generated advice into words and sentences that are easy for users to understand.

[0143] This invention is a system that provides individually optimized advice based on the user's behavioral patterns and emotional state. The system consists of three elements: a terminal, a server, and a user.

[0144] The device collects information from the user's daily life. Specifically, it captures information using voice recognition sensors and emotion recognition camera sensors. This makes it possible to collect the user's voice tone, facial expressions, and even ambient sounds. For example, if it is suspected that the user is experiencing stress, this information is incorporated into the system.

[0145] The server receives information transmitted from the terminal and performs data analysis. Using a generative AI model, the server analyzes the user's behavior patterns and emotional state in detail. By comprehensively considering past activity history and momentary feelings, it can generate personalized advice for the user. The generated advice is then converted into a form easily understandable to the user using natural language processing technology.

[0146] The user receives advice through the device. This advice is customized according to the user's actions and situation, and is presented in an appropriate tone that matches the user's emotional state. For example, if the user wants to relax, the server might generate a suggestion such as, "How about listening to some quiet, soothing music?" and the device can display this to the user on the screen or via audio.

[0147] The AI ​​model is inputted with prompts such as, "Suggest simple and effective relaxation methods that would be helpful when the user is tired." This allows the system to provide information that is most appropriate to the user's state.

[0148] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0149] Step 1:

[0150] The device collects data from the user. Specifically, a voice recognition sensor captures the user's voice, and an emotion recognition camera sensor captures the user's facial expressions. The input data includes voice tone, facial expressions, and changes in the environment. The device analyzes this data in real time using an emotion engine to make an initial prediction of the user's emotional state.

[0151] Step 2:

[0152] The device sends the collected data to the server. The transmitted data includes voice tone, facial expression detection results, and other sensor information. Based on this data, the server organizes the data and prepares to link various pieces of information.

[0153] Step 3:

[0154] The server analyzes the received data. Specifically, it uses a generative AI model to perform a detailed analysis of the data. The prompt input is "Generate the best advice based on the user's current state." Based on this, advice is generated that takes into account behavioral patterns and emotional states. In this step, past behavioral data is referenced and the data is processed to link it with the current emotional state.

[0155] Step 4:

[0156] This process converts server-generated advice into natural language. The input is a summary of the AI-generated advice, and the output is a text formatted to be easily understood by the user. Specifically, it uses natural language processing techniques to add appropriate tone and phrasing.

[0157] Step 5:

[0158] The server sends formatted advice to the terminal. Based on the received data, the terminal presents advice to the user. For example, a voice assistant might advise, "To relax, why not try taking a few deep breaths?" This output is provided in a user-friendly format, such as a screen display or audio output.

[0159] (Application Example 2)

[0160] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0161] In elderly care settings, there is a challenge in providing appropriate care that is tailored to the emotional state of the service users. Understanding a user's emotions and condition in real time and responding individually requires significant resources and specialized knowledge. Therefore, there is a need for a system that efficiently understands the user's condition and provides the most suitable care methods.

[0162] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0163] In this invention, the server includes means for collecting personal information, means for analyzing the personal information and generating advice tailored to individual needs based on emotional states, and means for adjusting the advice using natural language processing technology and providing it to the user. This makes it possible to grasp the user's emotional state in real time and provide optimal care advice according to the situation at that time.

[0164] "Means of collecting personal information" refers to devices or systems that have the function of acquiring user information using voice or visual sensors.

[0165] "Means of performing analysis and generating advice tailored to individual needs based on emotional state" refers to a process or system that has the function of analyzing the emotional state of users based on collected personal information and formulating appropriate support methods based on the results.

[0166] "Means of adjusting and providing advice to the user using natural language processing technology" refers to a device or software that has the function of converting generated advice into a format that is easy for the user to understand using natural language processing and providing it in appropriate language.

[0167] This invention aims to build a system that provides appropriate care based on the emotional state of users in care settings. The system mainly consists of three elements: a terminal, a server, and a user.

[0168] The devices are portable information terminals such as smartphones and tablets, equipped with voice recognition sensors and vision sensors. This allows the devices to capture the user's voice and facial expressions in real time and collect the data. This collected data is then transmitted from the device to a server.

[0169] The server plays the role of analyzing the received data. Here, a generative AI model (for example, OpenAI's GPT-3®) is used to evaluate the user's emotional state and, taking into account past behavioral patterns, generates personalized advice based on that emotional state and individual needs. This process involves data organization and analysis, enabling a comprehensive approach that takes the user's psychological state into account.

[0170] The generated advice is refined using natural language processing technology to make it easily understandable to the user. Specifically, the content suggested by the generating AI model is expressed in appropriate language and tone and presented to care staff via the device. This allows staff to obtain guidelines for providing appropriate care to users.

[0171] For example, if the emotion engine detects that a user is depressed one day, the server will prepare advice that includes appropriate relaxation methods and topics of conversation. This advice can then be conveyed to the staff in a gentle tone. This allows the staff to provide more effective and considerate care to the user.

[0172] An example of a prompt message might be, "Analyze the user's emotions and advise on the most appropriate care method for their condition." This would allow the system to generate a specific action plan.

[0173] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0174] Step 1:

[0175] The device captures the user's voice and facial expressions in real time. Input consists of the user's voice and video data, recorded by visual and speech recognition sensors. Output is the underlying data used to estimate the user's emotions.

[0176] Step 2:

[0177] The terminal sends the captured data to the server. The input here is the digital audio and video data obtained in step 1. The output is this data transferred to the server.

[0178] Step 3:

[0179] The server analyzes the received data and performs sentiment analysis using a generative AI model. The input consists of audio and video data transmitted in the previous step. The server processes the data using natural language processing and machine learning algorithms, and the output is information about the user's emotional state.

[0180] Step 4:

[0181] The server generates advice tailored to the emotional state based on the analysis results. The input is the result of the emotion analysis obtained in step 3. Based on this data, the generating AI model formulates advice suitable for the user. The output is specific care advice.

[0182] Step 5:

[0183] The server refines the generated advice using natural language processing techniques. The input is the advice created in step 4. The server converts this into human-readable language, and the output is advice in a user-friendly format.

[0184] Step 6:

[0185] The adjusted advice is sent to the terminal and presented to the care staff. The input here is the adjusted advice sent from the server. The output is specific action guidelines displayed on the terminal, which the care staff use to respond to the user.

[0186] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0187] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0188] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0189] [Second Embodiment]

[0190] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0191] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0192] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0193] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0194] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0195] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0196] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0197] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0198] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0199] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0200] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0201] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0202] This invention is a system that provides appropriate advice according to an individual's lifestyle and needs. This system consists of three main elements: a terminal, a server, and the user.

[0203] First, the device functions as an interface with the user and is installed in the user's living environment. The device incorporates voice recognition sensors and other sensors to collect the user's daily actions and voice commands. This allows the device to acquire basic data in real time to understand the user's lifestyle patterns and needs.

[0204] Next, data is sent from the terminal to the server. The server converts the received data into a format suitable for analysis and applies a generating AI model. Based on the collected data, this AI model analyzes the characteristics of the user's behavior patterns and generates advice that the user is likely to need. For example, if the user is seeking health improvement, the AI ​​model analyzes past eating habits and exercise history and suggests healthy meal menus and exercise plans.

[0205] The generated advice is converted into a user-friendly format using natural language processing technology. The server then sends the converted advice to the terminal.

[0206] The device provides the user with the advice it receives. The user receives advice from the device via voice and display, and can ask additional questions as needed. This user-friendly interface allows users to receive support tailored to their needs at any time.

[0207] For example, if a user says, "I want to save on my electricity bill this month," the device sends this request to the server, which analyzes the historical data and generates advice on specific energy-saving methods. This kind of dynamic support can improve the user's quality of life.

[0208] The following describes the processing flow.

[0209] Step 1:

[0210] The device collects the user's daily voice commands and actions. It utilizes a voice recognition sensor to convert the user's words into text data. It also acquires environmental data such as temperature, brightness, and movement through built-in sensors and temporarily stores it in local storage.

[0211] Step 2:

[0212] The terminal encrypts the various data it collects to ensure security and then sends it to the server. Transmission takes place via a secure network protocol. This process also requires confirmation of successful data transmission.

[0213] Step 3:

[0214] The server formats the received data for analysis and temporarily stores it in the database. Data cleaning is performed to correct incomplete data and outliers, preparing the data for improved analysis accuracy.

[0215] Step 4:

[0216] The server applies a generative AI model and performs analysis. Based on historical data, it analyzes user behavior patterns and identifies individual needs. From these results, it extracts advice that should be provided to the user.

[0217] Step 5:

[0218] The server uses natural language processing techniques to convert the generated advice into a format that is easy for the user to understand. This makes the advice easier for the user to accept.

[0219] Step 6:

[0220] The server sends the converted advice to the terminal. After sending, it waits for confirmation of receipt from the terminal and immediately resends if necessary.

[0221] Step 7:

[0222] The device provides the user with the received advice via voice output or display. The user can then ask further questions to continue the interaction and prepare to send any necessary feedback information back to the server.

[0223] (Example 1)

[0224] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0225] There is a growing need to provide timely and accurate advice tailored to individual lifestyles and needs. However, conventional systems have faced challenges in accurately understanding user behavior patterns and providing personalized advice based on them.

[0226] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0227] In this invention, the server includes means for collecting data, means for performing analysis based on the data and using a generative model to generate suggestions based on individual needs, and means for converting the generated suggestions into a format understandable to the user using language processing technology and providing them. This makes it possible to provide highly accurate advice in real time that is tailored to the user's behavior patterns.

[0228] "Data" refers to information about user behavior and the environment, which is acquired through sensors and voice recognition.

[0229] A "generative model" refers to an algorithm or technology that generates appropriate suggestions and advice based on a user's past behavior and needs.

[0230] "Language processing technology" is the technology that converts generated suggestions and advice into a natural language format that users can easily understand.

[0231] "Suggestions" refer to advice and instructions generated in a way that addresses the individual needs of the user, and are created using a generative model.

[0232] "Means" refers to the methods or devices used to achieve a specific objective.

[0233] This system provides advice tailored to the user's lifestyle and primarily consists of a terminal and a server. The terminal is installed in the user's living environment and is equipped with a voice recognition sensor and an accelerometer. This terminal records the user's voice commands and daily activities in real time and collects this data. Based on this collected data, the server performs analysis and generates advice based on individual needs using an AI model.

[0234] The generated advice is converted into a user-friendly language format by the server using natural language processing technology. The converted advice is then delivered to the user via the terminal, and the user can receive it visually or audibly. Furthermore, the user can ask additional questions to the terminal.

[0235] For example, if a user speaks to their device saying, "I want to save on my electricity bill this month," the device sends this request to a server. The server analyzes past electricity usage data and uses a generated AI model to suggest specific ways to save energy. Through this process, users can receive useful advice that can lead to improvements in their daily lives.

[0236] Examples of prompt statements include specific questions such as, "What advice can you offer the user if they want to live a healthy lifestyle?" These types of prompts form the basis for the system to provide appropriate information tailored to the user's needs.

[0237] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0238] Step 1:

[0239] The device collects user voice commands and daily activities through sensors. Input consists of real-time user voice commands and environmental activity data. Based on this data, the device generates foundational data to identify user behavior patterns and needs. Specifically, it uses speech recognition technology to convert voice into text data and records environmental sensor data with timestamps.

[0240] Step 2:

[0241] The terminal sends the collected data to the server. The input consists of the voice-text data and sensor data generated in step 1. The server processes this data, formatting it into a specific format and converting it into a format suitable for analysis. Through this data processing, the server prepares the data for analysis and inference.

[0242] Step 3:

[0243] The server uses a generative AI model to analyze the formatted data. The input is the user data transformed in step 2. The generative AI model analyzes the user's past behavioral patterns and generates specific advice based on their needs. For example, if the user's dietary data is input, it will create healthy meal suggestions.

[0244] Step 4:

[0245] The server transforms the generated advice using natural language processing technology. The input is the advice generated in step 3. This transformation makes the advice easier for the user to understand. The server translates it into plain language and organizes the advice into a format that is intuitively meaningful to the user.

[0246] Step 5:

[0247] The server sends the converted advice to the terminal, which then provides it to the user. The input is the advice converted in step 4. As output, the terminal presents the final advice to the user via voice or display. The user receives this advice and can ask the terminal additional questions if necessary. For example, the user can use the voice command "be more specific" to obtain more detailed information.

[0248] (Application Example 1)

[0249] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0250] In recent years, urban life has become increasingly complex, and there is a growing need to provide information and support tailored to the individual lifestyles of each citizen quickly and appropriately. However, existing information provision systems have struggled to flexibly provide advice that meets individual needs. Therefore, this invention aims to solve the problem of providing personalized advice in real time that is tailored to each individual's lifestyle, in order to improve the quality of life for residents in smart cities.

[0251] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0252] In this invention, the server includes means for collecting personal information, means for performing analysis based on said personal information and generating advice based on individual needs, and means for providing said advice to the user using a visual or auditory interface. This makes it possible for the user to quickly receive appropriate advice tailored to their desired environment and circumstances.

[0253] "Personal information" refers to information that can identify a specific individual, including data such as name, address, and behavioral patterns.

[0254] "Analysis" is the process of deriving patterns and trends from collected information and making decisions according to a specific purpose.

[0255] "Advice" refers to suggestions or instructions provided in response to specific situations or needs, and is information that can be used to help the recipient take action.

[0256] A "visual interface" is a means of presenting information to a user visually through displays, projections, and other means.

[0257] An "auditory interface" is a means of conveying information to a user audibly through sound or audio.

[0258] A "detector" is a device that senses environmental data such as sound and physical movement and converts it into an electronic signal.

[0259] To realize this invention, a system consisting of three main elements—a terminal, a server, and a user—is used. The terminal is installed in the user's environment and collects personal information using sensors such as cameras and microphones. Head-mounted displays and smart glasses are often used as terminals. This allows the user's daily activities and voice commands to be obtained in real time, and this information is transmitted to the server as digital data.

[0260] The server processes the received personal information and performs analysis using generative AI models. These AI models utilize AI technologies from Amazon Web Services and generative models from OpenAI. The analysis performed on the server is based on the user's past behavioral patterns and generates advice tailored to the user's specific needs. For example, health management advice and public transportation information are generated through this process.

[0261] The advice derived from the analysis is converted into a user-friendly format using natural language processing technology. It is then communicated to the user via a visual or auditory interface through the device. The Google Cloud Speech-to-Text API is used for voice control, while visual information is presented on the smart glasses' display.

[0262] As a concrete example, let's assume a user gives a command such as "Tell me what exercise I should do today" while they are out. In this case, the system will refer to the user's past exercise data and weather information to generate advice such as "Today's recommendation is a 20-minute brisk walk in the park," and provide it through visual display and voice response.

[0263] Examples of prompts for a generative AI model include the following:

[0264] "User voice command: 'I'd like some advice on my schedule for this week.'"

[0265] "Prompt to the AI ​​model: 'The weather will worsen on the candidate dates this week, so what relaxing indoor activities can be done?'"

[0266] This system makes it possible to provide detailed support for the quality of life of residents in smart cities and deliver personalized advice in a timely manner.

[0267] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0268] Step 1:

[0269] The device uses a camera and microphone to collect user voice commands and behavioral data. It takes user voice commands and action information as input and prepares to send this as digital data to the server. For example, if a user says, "Tell me the weather this week," that voice will be captured.

[0270] Step 2:

[0271] The server receives digital data sent from the terminal and converts the audio data into text using speech recognition technology. At this stage, the Google Cloud Speech-to-Text API is used to convert the voice instructions into a parseable format. This converted text becomes the input and is sent to the next data analysis step.

[0272] Step 3:

[0273] The server receives voice instructions in text format and inputs them into a generating AI model. The AI ​​model analyzes the instructions, compares them with past user data, and generates necessary advice. As part of the data processing, it refers to weather information and schedule data to generate corresponding suggestions. This generated advice becomes the output.

[0274] Step 4:

[0275] The server uses natural language processing techniques to convert the outputted advice into language that is easy for the user to understand. For example, the generated advice might become, "It's cloudy today, so we recommend doing yoga indoors." This natural language advice becomes the final output.

[0276] Step 5:

[0277] The device provides users with natural language advice retrieved from a server via voice or display. Specifically, advice is displayed visually on the smart glasses' screen and audibly through a speaker. This process allows users to receive specific and personalized information in real time.

[0278] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0279] The present invention is a system combined with an emotion engine that recognizes the emotions of users, and aims to provide more appropriate support to users. This system consists of three elements: a terminal, a server, and a user.

[0280] The terminal functions as an entrance for information in the daily life of users. The terminal is equipped with a voice recognition sensor, a camera sensor for emotion recognition, and other environmental sensors. In addition to capturing voice and actions in real time, an emotion engine is built in, and it is possible to analyze the user's voice tone and expression to infer the emotional state.

[0281] Data related to the emotions and actions of users is transmitted from the terminal to the server. The server receives this and organizes and analyzes the data. The server uses a generative AI model to perform comprehensive data analysis including the user's past behavior patterns and recognized emotional states. Thereby, optimal advice corresponding to the user's current condition is generated. By including the analysis of emotions in this process, not only advice based on behavior patterns but also more personalized support according to the user's current psychological state is realized.

[0282] The generated advice is converted into a form that is easy for users to understand using natural language processing technology. Furthermore, considering the emotional state, an appropriate tone and expression are added to the expression of the advice. This converted advice is returned to the terminal again and presented to the user from the terminal.

[0283] For example, when the emotion engine detects that the user is feeling stressed, the server generates advice including a relaxation method corresponding thereto and presents it in a gentle tone. In this way, this system that combines the emotion engine and analysis can provide support according to the situation that is more suitable for the user than ever before.

[0284] The following describes the processing flow.

[0285] Step 1:

[0286] The terminal begins to collect the user's daily voice and facial expression data. The voice recognition sensor converts the spoken words into text, and the camera sensor equipped with an emotion engine analyzes the emotion from the facial expression.

[0287] Step 2:

[0288] The terminal organizes the collected voice data, text data, emotion data, and other environmental sensor data, performs encryption and security protection, and then sends it to the server. This transmission uses a secure protocol for privacy protection.

[0289] Step 3:

[0290] The server stores the received data in the database and performs data cleaning. By removing noise and complementing the missing parts, preparations are made for more accurate analysis.

[0291] Step 4:

[0292] The server applies the generated AI model to comprehensively analyze the user's behavior patterns, current emotional state, and past history. Through this analysis, the areas where the user's needs and personal support are required are identified.

[0293] Step 5:

[0294] Based on the analysis results, the server generates optimal advice considering individual needs and emotional states. The advice reflects an appropriate tone and expression corresponding to the user's current emotion and is converted into a form that is easy for the user to understand using natural language processing technology.

[0295] Step 6:

[0296] The server sends the translated advice to the terminal. Once the data transmission to the terminal is complete, the server confirms success and resends the data if necessary.

[0297] Step 7:

[0298] The terminal presents the received advice to the user via voice and display. The user can provide feedback on the information provided, and this feedback is prepared to be sent to the server to help with future analysis.

[0299] (Example 2)

[0300] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0301] Traditional data analysis systems have the drawback of only being able to provide general advice based on user behavior patterns, lacking personalized support tailored to the emotional state of individual users. Furthermore, there is room for improvement in effectively integrating diverse data sources and providing easy-to-understand advice in natural language.

[0302] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0303] In this invention, the server includes means for collecting information, means for analyzing the information and generating advice based on individual needs that take into account behavioral patterns and emotional states, and means for converting the generated advice into a format understandable to the user using natural language processing technology and providing it. This makes it possible to provide personalized advice that is tailored to each user's emotional state.

[0304] "Information" refers to all data acquired through audio, video, and sensors, and includes various data related to the user's behavior and emotional state.

[0305] "Analysis" refers to the procedure of processing and analyzing the collected information to understand the user's behavior patterns and emotional states.

[0306] "Behavior pattern" refers to the data indicating the tendency of the results obtained by analyzing the user's past behavior history and habits.

[0307] "Emotional state" refers to the subjective mental state of the user inferred through data such as voice and expressions.

[0308] "Advice" refers to the specific instructions and suggestions provided to the user based on the analysis results.

[0309] "Generation" refers to the process of creating new advice and suggestions based on information and analysis results.

[0310] "Natural language processing technology" refers to the technical means for converting the generated advice into words and sentences that are easy for the user to understand.

[0311] This invention is a system that provides individually optimized advice based on the user's behavior patterns and emotional states. The system consists of three elements: a terminal, a server, and a user.

[0312] The terminal collects information in the user's daily life. Specifically, it captures information using a voice recognition sensor and an emotion recognition camera sensor. As a result, it is possible to collect the user's voice tone, expressions, and even ambient sounds. For example, when it is inferred that the user is feeling stressed, this information is incorporated into the system.

[0313] The server receives information transmitted from the terminal and performs data analysis. Using a generative AI model, the server analyzes the user's behavior patterns and emotional state in detail. By comprehensively considering past activity history and momentary feelings, it can generate personalized advice for the user. The generated advice is then converted into a form easily understandable to the user using natural language processing technology.

[0314] The user receives advice through the device. This advice is customized according to the user's actions and situation, and is presented in an appropriate tone that matches the user's emotional state. For example, if the user wants to relax, the server might generate a suggestion such as, "How about listening to some quiet, soothing music?" and the device can display this to the user on the screen or via audio.

[0315] The AI ​​model is inputted with prompts such as, "Suggest simple and effective relaxation methods that would be helpful when the user is tired." This allows the system to provide information that is most appropriate to the user's state.

[0316] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0317] Step 1:

[0318] The device collects data from the user. Specifically, a voice recognition sensor captures the user's voice, and an emotion recognition camera sensor captures the user's facial expressions. The input data includes voice tone, facial expressions, and changes in the environment. The device analyzes this data in real time using an emotion engine to make an initial prediction of the user's emotional state.

[0319] Step 2:

[0320] The device sends the collected data to the server. The transmitted data includes voice tone, facial expression detection results, and other sensor information. Based on this data, the server organizes the data and prepares to link various pieces of information.

[0321] Step 3:

[0322] The server analyzes the received data. Specifically, it uses a generative AI model to perform a detailed analysis of the data. The prompt input is "Generate the best advice based on the user's current state." Based on this, advice is generated that takes into account behavioral patterns and emotional states. In this step, past behavioral data is referenced and the data is processed to link it with the current emotional state.

[0323] Step 4:

[0324] This process converts server-generated advice into natural language. The input is a summary of the AI-generated advice, and the output is a text formatted to be easily understood by the user. Specifically, it uses natural language processing techniques to add appropriate tone and phrasing.

[0325] Step 5:

[0326] The server sends formatted advice to the terminal. Based on the received data, the terminal presents advice to the user. For example, a voice assistant might advise, "To relax, why not try taking a few deep breaths?" This output is provided in a user-friendly format, such as a screen display or audio output.

[0327] (Application Example 2)

[0328] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0329] In elderly care settings, there is a challenge in providing appropriate care that is tailored to the emotional state of the service users. Understanding a user's emotions and condition in real time and responding individually requires significant resources and specialized knowledge. Therefore, there is a need for a system that efficiently understands the user's condition and provides the most suitable care methods.

[0330] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0331] In this invention, the server includes means for collecting personal information, means for analyzing the personal information and generating advice tailored to individual needs based on emotional states, and means for adjusting the advice using natural language processing technology and providing it to the user. This makes it possible to grasp the user's emotional state in real time and provide optimal care advice according to the situation at that time.

[0332] "Means of collecting personal information" refers to devices or systems that have the function of acquiring user information using voice or visual sensors.

[0333] "Means of performing analysis and generating advice tailored to individual needs based on emotional state" refers to a process or system that has the function of analyzing the emotional state of users based on collected personal information and formulating appropriate support methods based on the results.

[0334] "Means of adjusting and providing advice to the user using natural language processing technology" refers to a device or software that has the function of converting generated advice into a format that is easy for the user to understand using natural language processing and providing it in appropriate language.

[0335] This invention aims to build a system that provides appropriate care based on the emotional state of users in care settings. The system mainly consists of three elements: a terminal, a server, and a user.

[0336] The devices are portable information terminals such as smartphones and tablets, equipped with voice recognition sensors and vision sensors. This allows the devices to capture the user's voice and facial expressions in real time and collect the data. This collected data is then transmitted from the device to a server.

[0337] The server plays the role of analyzing the received data. Here, a generative AI model (such as OpenAI's GPT-3) is used to evaluate the user's emotional state and, taking into account past behavioral patterns, generates personalized advice based on that emotional state and individual needs. This process involves data organization and analysis, enabling a comprehensive approach that takes the user's psychological state into account.

[0338] The generated advice is refined using natural language processing technology to make it easily understandable to the user. Specifically, the content suggested by the generating AI model is expressed in appropriate language and tone and presented to care staff via the device. This allows staff to obtain guidelines for providing appropriate care to users.

[0339] For example, if the emotion engine detects that a user is depressed one day, the server will prepare advice that includes appropriate relaxation methods and topics of conversation. This advice can then be conveyed to the staff in a gentle tone. This allows the staff to provide more effective and considerate care to the user.

[0340] An example of a prompt message might be, "Analyze the user's emotions and advise on the most appropriate care method for their condition." This would allow the system to generate a specific action plan.

[0341] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0342] Step 1:

[0343] The device captures the user's voice and facial expressions in real time. Input consists of the user's voice and video data, recorded by visual and speech recognition sensors. Output is the underlying data used to estimate the user's emotions.

[0344] Step 2:

[0345] The terminal sends the captured data to the server. The input here is the digital audio and video data obtained in step 1. The output is this data transferred to the server.

[0346] Step 3:

[0347] The server analyzes the received data and performs sentiment analysis using a generative AI model. The input consists of audio and video data transmitted in the previous step. The server processes the data using natural language processing and machine learning algorithms, and the output is information about the user's emotional state.

[0348] Step 4:

[0349] The server generates advice tailored to the emotional state based on the analysis results. The input is the result of the emotion analysis obtained in step 3. Based on this data, the generating AI model formulates advice suitable for the user. The output is specific care advice.

[0350] Step 5:

[0351] The server refines the generated advice using natural language processing techniques. The input is the advice created in step 4. The server converts this into human-readable language, and the output is advice in a user-friendly format.

[0352] Step 6:

[0353] The adjusted advice is sent to the terminal and presented to the care staff. The input here is the adjusted advice sent from the server. The output is specific action guidelines displayed on the terminal, which the care staff use to respond to the user.

[0354] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0355] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0356] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0357] [Third Embodiment]

[0358] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0359] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0360] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0361] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0362] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0363] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0364] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0365] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0366] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0367] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0368] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0369] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0370] This invention is a system that provides appropriate advice according to an individual's lifestyle and needs. This system consists of three main elements: a terminal, a server, and the user.

[0371] First, the device functions as an interface with the user and is installed in the user's living environment. The device incorporates voice recognition sensors and other sensors to collect the user's daily actions and voice commands. This allows the device to acquire basic data in real time to understand the user's lifestyle patterns and needs.

[0372] Next, data is sent from the terminal to the server. The server converts the received data into a format suitable for analysis and applies a generating AI model. Based on the collected data, this AI model analyzes the characteristics of the user's behavior patterns and generates advice that the user is likely to need. For example, if the user is seeking health improvement, the AI ​​model analyzes past eating habits and exercise history and suggests healthy meal menus and exercise plans.

[0373] The generated advice is converted into a user-friendly format using natural language processing technology. The server then sends the converted advice to the terminal.

[0374] The device provides the user with the advice it receives. The user receives advice from the device via voice and display, and can ask additional questions as needed. This user-friendly interface allows users to receive support tailored to their needs at any time.

[0375] For example, if a user says, "I want to save on my electricity bill this month," the device sends this request to the server, which analyzes the historical data and generates advice on specific energy-saving methods. This kind of dynamic support can improve the user's quality of life.

[0376] The following describes the processing flow.

[0377] Step 1:

[0378] The device collects the user's daily voice commands and actions. It utilizes a voice recognition sensor to convert the user's words into text data. It also acquires environmental data such as temperature, brightness, and movement through built-in sensors and temporarily stores it in local storage.

[0379] Step 2:

[0380] The terminal encrypts the various data it collects to ensure security and then sends it to the server. Transmission takes place via a secure network protocol. This process also requires confirmation of successful data transmission.

[0381] Step 3:

[0382] The server formats the received data for analysis and temporarily stores it in the database. Data cleaning is performed to correct incomplete data and outliers, preparing the data for improved analysis accuracy.

[0383] Step 4:

[0384] The server applies a generative AI model and performs analysis. Based on historical data, it analyzes user behavior patterns and identifies individual needs. From these results, it extracts advice that should be provided to the user.

[0385] Step 5:

[0386] The server uses natural language processing techniques to convert the generated advice into a format that is easy for the user to understand. This makes the advice easier for the user to accept.

[0387] Step 6:

[0388] The server sends the converted advice to the terminal. After sending, it waits for confirmation of receipt from the terminal and immediately resends if necessary.

[0389] Step 7:

[0390] The device provides the user with the received advice via voice output or display. The user can then ask further questions to continue the interaction and prepare to send any necessary feedback information back to the server.

[0391] (Example 1)

[0392] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0393] There is a growing need to provide timely and accurate advice tailored to individual lifestyles and needs. However, conventional systems have faced challenges in accurately understanding user behavior patterns and providing personalized advice based on them.

[0394] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0395] In this invention, the server includes means for collecting data, means for performing analysis based on the data and using a generative model to generate suggestions based on individual needs, and means for converting the generated suggestions into a format understandable to the user using language processing technology and providing them. This makes it possible to provide highly accurate advice in real time that is tailored to the user's behavior patterns.

[0396] "Data" refers to information about user behavior and the environment, which is acquired through sensors and voice recognition.

[0397] A "generative model" refers to an algorithm or technology that generates appropriate suggestions and advice based on a user's past behavior and needs.

[0398] "Language processing technology" is the technology that converts generated suggestions and advice into a natural language format that users can easily understand.

[0399] "Suggestions" refer to advice and instructions generated in a way that addresses the individual needs of the user, and are created using a generative model.

[0400] "Means" refers to the methods or devices used to achieve a specific objective.

[0401] This system provides advice tailored to the user's lifestyle and primarily consists of a terminal and a server. The terminal is installed in the user's living environment and is equipped with a voice recognition sensor and an accelerometer. This terminal records the user's voice commands and daily activities in real time and collects this data. Based on this collected data, the server performs analysis and generates advice based on individual needs using an AI model.

[0402] The generated advice is converted into a user-friendly language format by the server using natural language processing technology. The converted advice is then delivered to the user via the terminal, and the user can receive it visually or audibly. Furthermore, the user can ask additional questions to the terminal.

[0403] For example, if a user speaks to their device saying, "I want to save on my electricity bill this month," the device sends this request to a server. The server analyzes past electricity usage data and uses a generated AI model to suggest specific ways to save energy. Through this process, users can receive useful advice that can lead to improvements in their daily lives.

[0404] Examples of prompt statements include specific questions such as, "What advice can you offer the user if they want to live a healthy lifestyle?" These types of prompts form the basis for the system to provide appropriate information tailored to the user's needs.

[0405] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0406] Step 1:

[0407] The device collects user voice commands and daily activities through sensors. Input consists of real-time user voice commands and environmental activity data. Based on this data, the device generates foundational data to identify user behavior patterns and needs. Specifically, it uses speech recognition technology to convert voice into text data and records environmental sensor data with timestamps.

[0408] Step 2:

[0409] The terminal sends the collected data to the server. The input consists of the voice-text data and sensor data generated in step 1. The server processes this data, formatting it into a specific format and converting it into a format suitable for analysis. Through this data processing, the server prepares the data for analysis and inference.

[0410] Step 3:

[0411] The server uses a generative AI model to analyze the formatted data. The input is the user data transformed in step 2. The generative AI model analyzes the user's past behavioral patterns and generates specific advice based on their needs. For example, if the user's dietary data is input, it will create healthy meal suggestions.

[0412] Step 4:

[0413] The server transforms the generated advice using natural language processing technology. The input is the advice generated in step 3. This transformation makes the advice easier for the user to understand. The server translates it into plain language and organizes the advice into a format that is intuitively meaningful to the user.

[0414] Step 5:

[0415] The server sends the converted advice to the terminal, which then provides it to the user. The input is the advice converted in step 4. As output, the terminal presents the final advice to the user via voice or display. The user receives this advice and can ask the terminal additional questions if necessary. For example, the user can use the voice command "be more specific" to obtain more detailed information.

[0416] (Application Example 1)

[0417] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0418] In recent years, urban life has become increasingly complex, and there is a growing need to provide information and support tailored to the individual lifestyles of each citizen quickly and appropriately. However, existing information provision systems have struggled to flexibly provide advice that meets individual needs. Therefore, this invention aims to solve the problem of providing personalized advice in real time that is tailored to each individual's lifestyle, in order to improve the quality of life for residents in smart cities.

[0419] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0420] In this invention, the server includes means for collecting personal information, means for performing analysis based on said personal information and generating advice based on individual needs, and means for providing said advice to the user using a visual or auditory interface. This makes it possible for the user to quickly receive appropriate advice tailored to their desired environment and circumstances.

[0421] "Personal information" refers to information that can identify a specific individual, including data such as name, address, and behavioral patterns.

[0422] "Analysis" is the process of deriving patterns and trends from collected information and making decisions according to a specific purpose.

[0423] "Advice" refers to suggestions or instructions provided in response to specific situations or needs, and is information that can be used to help the recipient take action.

[0424] A "visual interface" is a means of presenting information to a user visually through displays, projections, and other means.

[0425] An "auditory interface" is a means of conveying information to a user audibly through sound or audio.

[0426] A "detector" is a device that senses environmental data such as sound and physical movement and converts it into an electronic signal.

[0427] To realize this invention, a system consisting of three main elements—a terminal, a server, and a user—is used. The terminal is installed in the user's environment and collects personal information using sensors such as cameras and microphones. Head-mounted displays and smart glasses are often used as terminals. This allows the user's daily activities and voice commands to be obtained in real time, and this information is transmitted to the server as digital data.

[0428] The server processes the received personal information and performs analysis using generative AI models. These AI models utilize AI technologies from Amazon Web Services and generative models from OpenAI. The analysis performed on the server is based on the user's past behavioral patterns and generates advice tailored to the user's specific needs. For example, health management advice and public transportation information are generated through this process.

[0429] The advice derived from the analysis is converted into a user-friendly format using natural language processing technology. It is then communicated to the user via a visual or auditory interface through the device. The Google Cloud Speech-to-Text API is used for voice control, while visual information is presented on the smart glasses' display.

[0430] As a concrete example, let's assume a user gives a command such as "Tell me what exercise I should do today" while they are out. In this case, the system will refer to the user's past exercise data and weather information to generate advice such as "Today's recommendation is a 20-minute brisk walk in the park," and provide it through visual display and voice response.

[0431] Examples of prompts for a generative AI model include the following:

[0432] "User voice command: 'I'd like some advice on my schedule for this week.'"

[0433] "Prompt to the AI ​​model: 'The weather will worsen on the candidate dates this week, so what relaxing indoor activities can be done?'"

[0434] This system makes it possible to provide detailed support for the quality of life of residents in smart cities and deliver personalized advice in a timely manner.

[0435] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0436] Step 1:

[0437] The device uses a camera and microphone to collect user voice commands and behavioral data. It takes user voice commands and action information as input and prepares to send this as digital data to the server. For example, if a user says, "Tell me the weather this week," that voice will be captured.

[0438] Step 2:

[0439] The server receives digital data sent from the terminal and converts the audio data into text using speech recognition technology. At this stage, the Google Cloud Speech-to-Text API is used to convert the voice instructions into a parseable format. This converted text becomes the input and is sent to the next data analysis step.

[0440] Step 3:

[0441] The server receives voice instructions in text format and inputs them into a generating AI model. The AI ​​model analyzes the instructions, compares them with past user data, and generates necessary advice. As part of the data processing, it refers to weather information and schedule data to generate corresponding suggestions. This generated advice becomes the output.

[0442] Step 4:

[0443] The server uses natural language processing techniques to convert the outputted advice into language that is easy for the user to understand. For example, the generated advice might become, "It's cloudy today, so we recommend doing yoga indoors." This natural language advice becomes the final output.

[0444] Step 5:

[0445] The device provides users with natural language advice retrieved from a server via voice or display. Specifically, advice is displayed visually on the smart glasses' screen and audibly through a speaker. This process allows users to receive specific and personalized information in real time.

[0446] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0447] This invention is a system that combines an emotion engine to recognize user emotions, and aims to provide users with more appropriate support. This system consists of three elements: a terminal, a server, and the user.

[0448] The device functions as a gateway to information in the user's daily life. It is equipped with a voice recognition sensor, a camera sensor for emotion recognition, and other environmental sensors. In addition to capturing voice and behavior in real time, it has a built-in emotion engine that can analyze the user's voice tone and facial expressions to infer their emotional state.

[0449] Data regarding the user's emotions and behavior is sent from the device to the server. The server receives this data and organizes and analyzes it. The server uses a generative AI model to perform a comprehensive data analysis that includes the user's past behavioral patterns and recognized emotional states. This generates optimal advice tailored to the user's current condition. By including emotional analysis in this process, more personalized support is achieved that is not simply based on behavioral patterns, but rather on the user's current psychological state.

[0450] The generated advice is transformed into a form easily understood by the user using natural language processing technology. Furthermore, appropriate tone and phrasing are added to the advice, taking into account the user's emotional state. This transformed advice is then returned to the device and presented to the user from the device.

[0451] For example, if the emotion engine detects that a user is experiencing stress, the server generates advice, including relaxation techniques, and presents it in a gentle tone. In this way, the system, which combines the emotion engine with analysis, can provide users with more appropriate and situation-specific support than ever before.

[0452] The following describes the processing flow.

[0453] Step 1:

[0454] The device begins collecting the user's everyday voice and facial expression data. A voice recognition sensor converts spoken words into text, and a camera sensor equipped with an emotion engine analyzes emotions from facial expressions.

[0455] Step 2:

[0456] The device organizes the collected voice data, text data, sentiment data, and other environmental sensor data, encrypts and secures it, and then sends it to the server. This transmission uses a secure protocol to protect privacy.

[0457] Step 3:

[0458] The server saves the received data to a database and performs data cleaning. By removing noise and filling in missing parts, it prepares the data for more accurate analysis.

[0459] Step 4:

[0460] The server applies a generative AI model to comprehensively analyze the user's behavior patterns, current emotional state, and past history. This analysis identifies the user's needs and areas where personalized support is required.

[0461] Step 5:

[0462] Based on the analysis results, the server generates optimal advice that takes into account individual needs and emotional states. The advice reflects the appropriate tone and expression corresponding to the user's current emotions and is transformed into a form that is easy for the user to understand using natural language processing technology.

[0463] Step 6:

[0464] The server sends the translated advice to the terminal. Once the data transmission to the terminal is complete, the server confirms success and resends the data if necessary.

[0465] Step 7:

[0466] The terminal presents the received advice to the user via voice and display. The user can provide feedback on the information provided, and this feedback is prepared to be sent to the server to help with future analysis.

[0467] (Example 2)

[0468] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0469] Traditional data analysis systems have the drawback of only being able to provide general advice based on user behavior patterns, lacking personalized support tailored to the emotional state of individual users. Furthermore, there is room for improvement in effectively integrating diverse data sources and providing easy-to-understand advice in natural language.

[0470] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0471] In this invention, the server includes means for collecting information, means for analyzing the information and generating advice based on individual needs that take into account behavioral patterns and emotional states, and means for converting the generated advice into a format understandable to the user using natural language processing technology and providing it. This makes it possible to provide personalized advice that is tailored to each user's emotional state.

[0472] "Information" refers to all data acquired through audio, video, and sensors, and includes various data related to the user's behavior and emotional state.

[0473] "Analysis" is the process of processing and analyzing collected information to understand user behavior patterns and emotional states.

[0474] "Behavioral patterns" refer to data that shows trends resulting from an analysis of a user's past behavioral history and habits.

[0475] "Emotional state" refers to the user's subjective mental condition, inferred from data such as voice and facial expressions.

[0476] "Advice" refers to specific instructions or suggestions provided to the user based on the analysis results.

[0477] "Generation" refers to the process of creating new advice or suggestions based on information and analysis results.

[0478] "Natural language processing technology" is a technical means of converting generated advice into words and sentences that are easy for users to understand.

[0479] This invention is a system that provides individually optimized advice based on the user's behavioral patterns and emotional state. The system consists of three elements: a terminal, a server, and a user.

[0480] The device collects information from the user's daily life. Specifically, it captures information using voice recognition sensors and emotion recognition camera sensors. This makes it possible to collect the user's voice tone, facial expressions, and even ambient sounds. For example, if it is suspected that the user is experiencing stress, this information is incorporated into the system.

[0481] The server receives information transmitted from the terminal and performs data analysis. Using a generative AI model, the server analyzes the user's behavior patterns and emotional state in detail. By comprehensively considering past activity history and momentary feelings, it can generate personalized advice for the user. The generated advice is then converted into a form easily understandable to the user using natural language processing technology.

[0482] The user receives advice through the device. This advice is customized according to the user's actions and situation, and is presented in an appropriate tone that matches the user's emotional state. For example, if the user wants to relax, the server might generate a suggestion such as, "How about listening to some quiet, soothing music?" and the device can display this to the user on the screen or via audio.

[0483] The AI ​​model is inputted with prompts such as, "Suggest simple and effective relaxation methods that would be helpful when the user is tired." This allows the system to provide information that is most appropriate to the user's state.

[0484] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0485] Step 1:

[0486] The device collects data from the user. Specifically, a voice recognition sensor captures the user's voice, and an emotion recognition camera sensor captures the user's facial expressions. The input data includes voice tone, facial expressions, and changes in the environment. The device analyzes this data in real time using an emotion engine to make an initial prediction of the user's emotional state.

[0487] Step 2:

[0488] The device sends the collected data to the server. The transmitted data includes voice tone, facial expression detection results, and other sensor information. Based on this data, the server organizes the data and prepares to link various pieces of information.

[0489] Step 3:

[0490] The server analyzes the received data. Specifically, it uses a generative AI model to perform a detailed analysis of the data. The prompt input is "Generate the best advice based on the user's current state." Based on this, advice is generated that takes into account behavioral patterns and emotional states. In this step, past behavioral data is referenced and the data is processed to link it with the current emotional state.

[0491] Step 4:

[0492] This process converts server-generated advice into natural language. The input is a summary of the AI-generated advice, and the output is a text formatted to be easily understood by the user. Specifically, it uses natural language processing techniques to add appropriate tone and phrasing.

[0493] Step 5:

[0494] The server sends formatted advice to the terminal. Based on the received data, the terminal presents advice to the user. For example, a voice assistant might advise, "To relax, why not try taking a few deep breaths?" This output is provided in a user-friendly format, such as a screen display or audio output.

[0495] (Application Example 2)

[0496] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0497] In elderly care settings, there is a challenge in providing appropriate care that is tailored to the emotional state of the service users. Understanding a user's emotions and condition in real time and responding individually requires significant resources and specialized knowledge. Therefore, there is a need for a system that efficiently understands the user's condition and provides the most suitable care methods.

[0498] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0499] In this invention, the server includes means for collecting personal information, means for analyzing the personal information and generating advice tailored to individual needs based on emotional states, and means for adjusting the advice using natural language processing technology and providing it to the user. This makes it possible to grasp the user's emotional state in real time and provide optimal care advice according to the situation at that time.

[0500] "Means of collecting personal information" refers to devices or systems that have the function of acquiring user information using voice or visual sensors.

[0501] "Means of performing analysis and generating advice tailored to individual needs based on emotional state" refers to a process or system that has the function of analyzing the emotional state of users based on collected personal information and formulating appropriate support methods based on the results.

[0502] "Means of adjusting and providing advice to the user using natural language processing technology" refers to a device or software that has the function of converting generated advice into a format that is easy for the user to understand using natural language processing and providing it in appropriate language.

[0503] This invention aims to build a system that provides appropriate care based on the emotional state of users in care settings. The system mainly consists of three elements: a terminal, a server, and a user.

[0504] The devices are portable information terminals such as smartphones and tablets, equipped with voice recognition sensors and vision sensors. This allows the devices to capture the user's voice and facial expressions in real time and collect the data. This collected data is then transmitted from the device to a server.

[0505] The server plays the role of analyzing the received data. Here, a generative AI model (such as OpenAI's GPT-3) is used to evaluate the user's emotional state and, taking into account past behavioral patterns, generates personalized advice based on that emotional state and individual needs. This process involves data organization and analysis, enabling a comprehensive approach that takes the user's psychological state into account.

[0506] The generated advice is refined using natural language processing technology to make it easily understandable to the user. Specifically, the content suggested by the generating AI model is expressed in appropriate language and tone and presented to care staff via the device. This allows staff to obtain guidelines for providing appropriate care to users.

[0507] For example, if the emotion engine detects that a user is depressed one day, the server will prepare advice that includes appropriate relaxation methods and topics of conversation. This advice can then be conveyed to the staff in a gentle tone. This allows the staff to provide more effective and considerate care to the user.

[0508] An example of a prompt message might be, "Analyze the user's emotions and advise on the most appropriate care method for their condition." This would allow the system to generate a specific action plan.

[0509] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0510] Step 1:

[0511] The device captures the user's voice and facial expressions in real time. Input consists of the user's voice and video data, recorded by visual and speech recognition sensors. Output is the underlying data used to estimate the user's emotions.

[0512] Step 2:

[0513] The terminal sends the captured data to the server. The input here is the digital audio and video data obtained in step 1. The output is this data transferred to the server.

[0514] Step 3:

[0515] The server analyzes the received data and performs sentiment analysis using a generative AI model. The input consists of audio and video data transmitted in the previous step. The server processes the data using natural language processing and machine learning algorithms, and the output is information about the user's emotional state.

[0516] Step 4:

[0517] The server generates advice tailored to the emotional state based on the analysis results. The input is the result of the emotion analysis obtained in step 3. Based on this data, the generating AI model formulates advice suitable for the user. The output is specific care advice.

[0518] Step 5:

[0519] The server refines the generated advice using natural language processing techniques. The input is the advice created in step 4. The server converts this into human-readable language, and the output is advice in a user-friendly format.

[0520] Step 6:

[0521] The adjusted advice is sent to the terminal and presented to the care staff. The input here is the adjusted advice sent from the server. The output is specific action guidelines displayed on the terminal, which the care staff use to respond to the user.

[0522] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0523] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0524] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0525] [Fourth Embodiment]

[0526] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0527] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0528] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0529] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0530] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0531] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0532] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0533] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0534] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0535] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0536] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0537] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0538] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0539] This invention is a system that provides appropriate advice according to an individual's lifestyle and needs. This system consists of three main elements: a terminal, a server, and the user.

[0540] First, the device functions as an interface with the user and is installed in the user's living environment. The device incorporates voice recognition sensors and other sensors to collect the user's daily actions and voice commands. This allows the device to acquire basic data in real time to understand the user's lifestyle patterns and needs.

[0541] Next, data is sent from the terminal to the server. The server converts the received data into a format suitable for analysis and applies a generating AI model. Based on the collected data, this AI model analyzes the characteristics of the user's behavior patterns and generates advice that the user is likely to need. For example, if the user is seeking health improvement, the AI ​​model analyzes past eating habits and exercise history and suggests healthy meal menus and exercise plans.

[0542] The generated advice is converted into a user-friendly format using natural language processing technology. The server then sends the converted advice to the terminal.

[0543] The device provides the user with the advice it receives. The user receives advice from the device via voice and display, and can ask additional questions as needed. This user-friendly interface allows users to receive support tailored to their needs at any time.

[0544] For example, if a user says, "I want to save on my electricity bill this month," the device sends this request to the server, which analyzes the historical data and generates advice on specific energy-saving methods. This kind of dynamic support can improve the user's quality of life.

[0545] The following describes the processing flow.

[0546] Step 1:

[0547] The device collects the user's daily voice commands and actions. It utilizes a voice recognition sensor to convert the user's words into text data. It also acquires environmental data such as temperature, brightness, and movement through built-in sensors and temporarily stores it in local storage.

[0548] Step 2:

[0549] The terminal encrypts the various data it collects to ensure security and then sends it to the server. Transmission takes place via a secure network protocol. This process also requires confirmation of successful data transmission.

[0550] Step 3:

[0551] The server formats the received data for analysis and temporarily stores it in the database. Data cleaning is performed to correct incomplete data and outliers, preparing the data for improved analysis accuracy.

[0552] Step 4:

[0553] The server applies a generative AI model and performs analysis. Based on historical data, it analyzes user behavior patterns and identifies individual needs. From these results, it extracts advice that should be provided to the user.

[0554] Step 5:

[0555] The server uses natural language processing techniques to convert the generated advice into a format that is easy for the user to understand. This makes the advice easier for the user to accept.

[0556] Step 6:

[0557] The server sends the converted advice to the terminal. After sending, it waits for confirmation of receipt from the terminal and immediately resends if necessary.

[0558] Step 7:

[0559] The device provides the user with the received advice via voice output or display. The user can then ask further questions to continue the interaction and prepare to send any necessary feedback information back to the server.

[0560] (Example 1)

[0561] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0562] There is a growing need to provide timely and accurate advice tailored to individual lifestyles and needs. However, conventional systems have faced challenges in accurately understanding user behavior patterns and providing personalized advice based on them.

[0563] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0564] In this invention, the server includes means for collecting data, means for performing analysis based on the data and using a generative model to generate suggestions based on individual needs, and means for converting the generated suggestions into a format understandable to the user using language processing technology and providing them. This makes it possible to provide highly accurate advice in real time that is tailored to the user's behavior patterns.

[0565] "Data" refers to information about user behavior and the environment, which is acquired through sensors and voice recognition.

[0566] A "generative model" refers to an algorithm or technology that generates appropriate suggestions and advice based on a user's past behavior and needs.

[0567] "Language processing technology" is the technology that converts generated suggestions and advice into a natural language format that users can easily understand.

[0568] "Suggestions" refer to advice and instructions generated in a way that addresses the individual needs of the user, and are created using a generative model.

[0569] "Means" refers to the methods or devices used to achieve a specific objective.

[0570] This system provides advice tailored to the user's lifestyle and primarily consists of a terminal and a server. The terminal is installed in the user's living environment and is equipped with a voice recognition sensor and an accelerometer. This terminal records the user's voice commands and daily activities in real time and collects this data. Based on this collected data, the server performs analysis and generates advice based on individual needs using an AI model.

[0571] The generated advice is converted into a user-friendly language format by the server using natural language processing technology. The converted advice is then delivered to the user via the terminal, and the user can receive it visually or audibly. Furthermore, the user can ask additional questions to the terminal.

[0572] For example, if a user speaks to their device saying, "I want to save on my electricity bill this month," the device sends this request to a server. The server analyzes past electricity usage data and uses a generated AI model to suggest specific ways to save energy. Through this process, users can receive useful advice that can lead to improvements in their daily lives.

[0573] Examples of prompt statements include specific questions such as, "What advice can you offer the user if they want to live a healthy lifestyle?" These types of prompts form the basis for the system to provide appropriate information tailored to the user's needs.

[0574] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0575] Step 1:

[0576] The device collects user voice commands and daily activities through sensors. Input consists of real-time user voice commands and environmental activity data. Based on this data, the device generates foundational data to identify user behavior patterns and needs. Specifically, it uses speech recognition technology to convert voice into text data and records environmental sensor data with timestamps.

[0577] Step 2:

[0578] The terminal sends the collected data to the server. The input consists of the voice-text data and sensor data generated in step 1. The server processes this data, formatting it into a specific format and converting it into a format suitable for analysis. Through this data processing, the server prepares the data for analysis and inference.

[0579] Step 3:

[0580] The server uses a generative AI model to analyze the formatted data. The input is the user data transformed in step 2. The generative AI model analyzes the user's past behavioral patterns and generates specific advice based on their needs. For example, if the user's dietary data is input, it will create healthy meal suggestions.

[0581] Step 4:

[0582] The server transforms the generated advice using natural language processing technology. The input is the advice generated in step 3. This transformation makes the advice easier for the user to understand. The server translates it into plain language and organizes the advice into a format that is intuitively meaningful to the user.

[0583] Step 5:

[0584] The server sends the converted advice to the terminal, which then provides it to the user. The input is the advice converted in step 4. As output, the terminal presents the final advice to the user via voice or display. The user receives this advice and can ask the terminal additional questions if necessary. For example, the user can use the voice command "be more specific" to obtain more detailed information.

[0585] (Application Example 1)

[0586] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0587] In recent years, urban life has become increasingly complex, and there is a growing need to provide information and support tailored to the individual lifestyles of each citizen quickly and appropriately. However, existing information provision systems have struggled to flexibly provide advice that meets individual needs. Therefore, this invention aims to solve the problem of providing personalized advice in real time that is tailored to each individual's lifestyle, in order to improve the quality of life for residents in smart cities.

[0588] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0589] In this invention, the server includes means for collecting personal information, means for performing analysis based on said personal information and generating advice based on individual needs, and means for providing said advice to the user using a visual or auditory interface. This makes it possible for the user to quickly receive appropriate advice tailored to their desired environment and circumstances.

[0590] "Personal information" refers to information that can identify a specific individual, including data such as name, address, and behavioral patterns.

[0591] "Analysis" is the process of deriving patterns and trends from collected information and making decisions according to a specific purpose.

[0592] "Advice" refers to suggestions or instructions provided in response to specific situations or needs, and is information that can be used to help the recipient take action.

[0593] A "visual interface" is a means of presenting information to a user visually through displays, projections, and other means.

[0594] An "auditory interface" is a means of conveying information to a user audibly through sound or audio.

[0595] A "detector" is a device that senses environmental data such as sound and physical movement and converts it into an electronic signal.

[0596] To realize this invention, a system consisting of three main elements—a terminal, a server, and a user—is used. The terminal is installed in the user's environment and collects personal information using sensors such as cameras and microphones. Head-mounted displays and smart glasses are often used as terminals. This allows the user's daily activities and voice commands to be obtained in real time, and this information is transmitted to the server as digital data.

[0597] The server processes the received personal information and performs analysis using generative AI models. These AI models utilize AI technologies from Amazon Web Services and generative models from OpenAI. The analysis performed on the server is based on the user's past behavioral patterns and generates advice tailored to the user's specific needs. For example, health management advice and public transportation information are generated through this process.

[0598] The advice derived from the analysis is converted into a user-friendly format using natural language processing technology. It is then communicated to the user via a visual or auditory interface through the device. The Google Cloud Speech-to-Text API is used for voice control, while visual information is presented on the smart glasses' display.

[0599] As a concrete example, let's assume a user gives a command such as "Tell me what exercise I should do today" while they are out. In this case, the system will refer to the user's past exercise data and weather information to generate advice such as "Today's recommendation is a 20-minute brisk walk in the park," and provide it through visual display and voice response.

[0600] Examples of prompts for a generative AI model include the following:

[0601] "User voice command: 'I'd like some advice on my schedule for this week.'"

[0602] "Prompt to the AI ​​model: 'The weather will worsen on the candidate dates this week, so what relaxing indoor activities can be done?'"

[0603] This system makes it possible to provide detailed support for the quality of life of residents in smart cities and deliver personalized advice in a timely manner.

[0604] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0605] Step 1:

[0606] The device uses a camera and microphone to collect user voice commands and behavioral data. It takes user voice commands and action information as input and prepares to send this as digital data to the server. For example, if a user says, "Tell me the weather this week," that voice will be captured.

[0607] Step 2:

[0608] The server receives digital data sent from the terminal and converts the audio data into text using speech recognition technology. At this stage, the Google Cloud Speech-to-Text API is used to convert the voice instructions into a parseable format. This converted text becomes the input and is sent to the next data analysis step.

[0609] Step 3:

[0610] The server receives voice instructions in text format and inputs them into a generating AI model. The AI ​​model analyzes the instructions, compares them with past user data, and generates necessary advice. As part of the data processing, it refers to weather information and schedule data to generate corresponding suggestions. This generated advice becomes the output.

[0611] Step 4:

[0612] The server uses natural language processing techniques to convert the outputted advice into language that is easy for the user to understand. For example, the generated advice might become, "It's cloudy today, so we recommend doing yoga indoors." This natural language advice becomes the final output.

[0613] Step 5:

[0614] The device provides users with natural language advice retrieved from a server via voice or display. Specifically, advice is displayed visually on the smart glasses' screen and audibly through a speaker. This process allows users to receive specific and personalized information in real time.

[0615] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0616] This invention is a system that combines an emotion engine to recognize user emotions, and aims to provide users with more appropriate support. This system consists of three elements: a terminal, a server, and the user.

[0617] The device functions as a gateway to information in the user's daily life. It is equipped with a voice recognition sensor, a camera sensor for emotion recognition, and other environmental sensors. In addition to capturing voice and behavior in real time, it has a built-in emotion engine that can analyze the user's voice tone and facial expressions to infer their emotional state.

[0618] Data regarding the user's emotions and behavior is sent from the device to the server. The server receives this data and organizes and analyzes it. The server uses a generative AI model to perform a comprehensive data analysis that includes the user's past behavioral patterns and recognized emotional states. This generates optimal advice tailored to the user's current condition. By including emotional analysis in this process, more personalized support is achieved that is not simply based on behavioral patterns, but rather on the user's current psychological state.

[0619] The generated advice is transformed into a form easily understood by the user using natural language processing technology. Furthermore, appropriate tone and phrasing are added to the advice, taking into account the user's emotional state. This transformed advice is then returned to the device and presented to the user from the device.

[0620] For example, if the emotion engine detects that a user is experiencing stress, the server generates advice, including relaxation techniques, and presents it in a gentle tone. In this way, the system, which combines the emotion engine with analysis, can provide users with more appropriate and situation-specific support than ever before.

[0621] The following describes the processing flow.

[0622] Step 1:

[0623] The device begins collecting the user's everyday voice and facial expression data. A voice recognition sensor converts spoken words into text, and a camera sensor equipped with an emotion engine analyzes emotions from facial expressions.

[0624] Step 2:

[0625] The device organizes the collected voice data, text data, sentiment data, and other environmental sensor data, encrypts and secures it, and then sends it to the server. This transmission uses a secure protocol to protect privacy.

[0626] Step 3:

[0627] The server saves the received data to a database and performs data cleaning. By removing noise and filling in missing parts, it prepares the data for more accurate analysis.

[0628] Step 4:

[0629] The server applies a generative AI model to comprehensively analyze the user's behavior patterns, current emotional state, and past history. This analysis identifies the user's needs and areas where personalized support is required.

[0630] Step 5:

[0631] Based on the analysis results, the server generates optimal advice that takes into account individual needs and emotional states. The advice reflects the appropriate tone and expression corresponding to the user's current emotions and is transformed into a form that is easy for the user to understand using natural language processing technology.

[0632] Step 6:

[0633] The server sends the translated advice to the terminal. Once the data transmission to the terminal is complete, the server confirms success and resends the data if necessary.

[0634] Step 7:

[0635] The terminal presents the received advice to the user via voice and display. The user can provide feedback on the information provided, and this feedback is prepared to be sent to the server to help with future analysis.

[0636] (Example 2)

[0637] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0638] Traditional data analysis systems have the drawback of only being able to provide general advice based on user behavior patterns, lacking personalized support tailored to the emotional state of individual users. Furthermore, there is room for improvement in effectively integrating diverse data sources and providing easy-to-understand advice in natural language.

[0639] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0640] In this invention, the server includes means for collecting information, means for analyzing the information and generating advice based on individual needs that take into account behavioral patterns and emotional states, and means for converting the generated advice into a format understandable to the user using natural language processing technology and providing it. This makes it possible to provide personalized advice that is tailored to each user's emotional state.

[0641] "Information" refers to all data acquired through audio, video, and sensors, and includes various data related to the user's behavior and emotional state.

[0642] "Analysis" is the process of processing and analyzing collected information to understand user behavior patterns and emotional states.

[0643] "Behavioral patterns" refer to data that shows trends resulting from an analysis of a user's past behavioral history and habits.

[0644] "Emotional state" refers to the user's subjective mental condition, inferred from data such as voice and facial expressions.

[0645] "Advice" refers to specific instructions or suggestions provided to the user based on the analysis results.

[0646] "Generation" refers to the process of creating new advice or suggestions based on information and analysis results.

[0647] "Natural language processing technology" is a technical means of converting generated advice into words and sentences that are easy for users to understand.

[0648] This invention is a system that provides individually optimized advice based on the user's behavioral patterns and emotional state. The system consists of three elements: a terminal, a server, and a user.

[0649] The device collects information from the user's daily life. Specifically, it captures information using voice recognition sensors and emotion recognition camera sensors. This makes it possible to collect the user's voice tone, facial expressions, and even ambient sounds. For example, if it is suspected that the user is experiencing stress, this information is incorporated into the system.

[0650] The server receives information transmitted from the terminal and performs data analysis. Using a generative AI model, the server analyzes the user's behavior patterns and emotional state in detail. By comprehensively considering past activity history and momentary feelings, it can generate personalized advice for the user. The generated advice is then converted into a form easily understandable to the user using natural language processing technology.

[0651] The user receives advice through the device. This advice is customized according to the user's actions and situation, and is presented in an appropriate tone that matches the user's emotional state. For example, if the user wants to relax, the server might generate a suggestion such as, "How about listening to some quiet, soothing music?" and the device can display this to the user on the screen or via audio.

[0652] The AI ​​model is inputted with prompts such as, "Suggest simple and effective relaxation methods that would be helpful when the user is tired." This allows the system to provide information that is most appropriate to the user's state.

[0653] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0654] Step 1:

[0655] The device collects data from the user. Specifically, a voice recognition sensor captures the user's voice, and an emotion recognition camera sensor captures the user's facial expressions. The input data includes voice tone, facial expressions, and changes in the environment. The device analyzes this data in real time using an emotion engine to make an initial prediction of the user's emotional state.

[0656] Step 2:

[0657] The device sends the collected data to the server. The transmitted data includes voice tone, facial expression detection results, and other sensor information. Based on this data, the server organizes the data and prepares to link various pieces of information.

[0658] Step 3:

[0659] The server analyzes the received data. Specifically, it uses a generative AI model to perform a detailed analysis of the data. The prompt input is "Generate the best advice based on the user's current state." Based on this, advice is generated that takes into account behavioral patterns and emotional states. In this step, past behavioral data is referenced and the data is processed to link it with the current emotional state.

[0660] Step 4:

[0661] This process converts server-generated advice into natural language. The input is a summary of the AI-generated advice, and the output is a text formatted to be easily understood by the user. Specifically, it uses natural language processing techniques to add appropriate tone and phrasing.

[0662] Step 5:

[0663] The server sends formatted advice to the terminal. Based on the received data, the terminal presents advice to the user. For example, a voice assistant might advise, "To relax, why not try taking a few deep breaths?" This output is provided in a user-friendly format, such as a screen display or audio output.

[0664] (Application Example 2)

[0665] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0666] In elderly care settings, there is a challenge in providing appropriate care that is tailored to the emotional state of the service users. Understanding a user's emotions and condition in real time and responding individually requires significant resources and specialized knowledge. Therefore, there is a need for a system that efficiently understands the user's condition and provides the most suitable care methods.

[0667] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0668] In this invention, the server includes means for collecting personal information, means for analyzing the personal information and generating advice tailored to individual needs based on emotional states, and means for adjusting the advice using natural language processing technology and providing it to the user. This makes it possible to grasp the user's emotional state in real time and provide optimal care advice according to the situation at that time.

[0669] "Means of collecting personal information" refers to devices or systems that have the function of acquiring user information using voice or visual sensors.

[0670] "Means of performing analysis and generating advice tailored to individual needs based on emotional state" refers to a process or system that has the function of analyzing the emotional state of users based on collected personal information and formulating appropriate support methods based on the results.

[0671] "Means of adjusting and providing advice to the user using natural language processing technology" refers to a device or software that has the function of converting generated advice into a format that is easy for the user to understand using natural language processing and providing it in appropriate language.

[0672] This invention aims to build a system that provides appropriate care based on the emotional state of users in care settings. The system mainly consists of three elements: a terminal, a server, and a user.

[0673] The devices are portable information terminals such as smartphones and tablets, equipped with voice recognition sensors and vision sensors. This allows the devices to capture the user's voice and facial expressions in real time and collect the data. This collected data is then transmitted from the device to a server.

[0674] The server plays the role of analyzing the received data. Here, a generative AI model (such as OpenAI's GPT-3) is used to evaluate the user's emotional state and, taking into account past behavioral patterns, generates personalized advice based on that emotional state and individual needs. This process involves data organization and analysis, enabling a comprehensive approach that takes the user's psychological state into account.

[0675] The generated advice is refined using natural language processing technology to make it easily understandable to the user. Specifically, the content suggested by the generating AI model is expressed in appropriate language and tone and presented to care staff via the device. This allows staff to obtain guidelines for providing appropriate care to users.

[0676] For example, if the emotion engine detects that a user is depressed one day, the server will prepare advice that includes appropriate relaxation methods and topics of conversation. This advice can then be conveyed to the staff in a gentle tone. This allows the staff to provide more effective and considerate care to the user.

[0677] An example of a prompt message might be, "Analyze the user's emotions and advise on the most appropriate care method for their condition." This would allow the system to generate a specific action plan.

[0678] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0679] Step 1:

[0680] The device captures the user's voice and facial expressions in real time. Input consists of the user's voice and video data, recorded by visual and speech recognition sensors. Output is the underlying data used to estimate the user's emotions.

[0681] Step 2:

[0682] The terminal sends the captured data to the server. The input here is the digital audio and video data obtained in step 1. The output is this data transferred to the server.

[0683] Step 3:

[0684] The server analyzes the received data and performs sentiment analysis using a generative AI model. The input consists of audio and video data transmitted in the previous step. The server processes the data using natural language processing and machine learning algorithms, and the output is information about the user's emotional state.

[0685] Step 4:

[0686] The server generates advice tailored to the emotional state based on the analysis results. The input is the result of the emotion analysis obtained in step 3. Based on this data, the generating AI model formulates advice suitable for the user. The output is specific care advice.

[0687] Step 5:

[0688] The server refines the generated advice using natural language processing techniques. The input is the advice created in step 4. The server converts this into human-readable language, and the output is advice in a user-friendly format.

[0689] Step 6:

[0690] The adjusted advice is sent to the terminal and presented to the care staff. The input here is the adjusted advice sent from the server. The output is specific action guidelines displayed on the terminal, which the care staff use to respond to the user.

[0691] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0692] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0693] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0694] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0695] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0696] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0697] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0698] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0699] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0700] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0701] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0702] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0703] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0704] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0705] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0706] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0707] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0708] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0709] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0710] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0711] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0712] The following is further disclosed regarding the embodiments described above.

[0713] (Claim 1)

[0714] Means of collecting personal information,

[0715] A means of performing analysis based on the personal information and generating advice based on individual needs,

[0716] Means for providing such advice to the user,

[0717] A system that includes this.

[0718] (Claim 2)

[0719] The system according to claim 1, comprising means for collecting personal information using voice and sensors.

[0720] (Claim 3)

[0721] The system according to claim 1, comprising means for converting the generated advice into a format understandable to the user using natural language processing technology and providing it to the user.

[0722] "Example 1"

[0723] (Claim 1)

[0724] Means of collecting data,

[0725] A means of using a generative model that performs analysis based on the data and generates proposals based on individual needs,

[0726] A means of converting the generated proposals into a format understandable to the user using language processing technology and providing it to the user,

[0727] A system that includes this.

[0728] (Claim 2)

[0729] The system according to claim 1, comprising means for collecting data using speech recognition and an accelerometer.

[0730] (Claim 3)

[0731] The system according to claim 1, comprising means for analyzing user behavior patterns and generating specific suggestions based on that history.

[0732] "Application Example 1"

[0733] (Claim 1)

[0734] Means of collecting personal information,

[0735] A means of performing analysis based on the personal information and generating advice based on individual needs,

[0736] Means for providing the advice to the user using a visual or auditory interface,

[0737] A system that includes this.

[0738] (Claim 2)

[0739] The system according to claim 1, comprising means for collecting personal information using voice and various detectors.

[0740] (Claim 3)

[0741] The system according to claim 1, comprising means for converting the generated advice into a format understandable to the user using natural language processing technology and providing it to the user.

[0742] "Example 2 of combining an emotion engine"

[0743] (Claim 1)

[0744] Means of collecting information,

[0745] A means for analyzing the information and generating personalized advice that takes into account behavioral patterns and emotional states,

[0746] A means for converting the generated advice into a format understandable to the user using natural language processing technology and providing it,

[0747] A system that includes this.

[0748] (Claim 2)

[0749] The system according to claim 1, comprising means for collecting information using voice and multiple sensors.

[0750] (Claim 3)

[0751] The system according to claim 1, comprising means for adjusting the expression of generated advice according to the emotional state.

[0752] "Application example 2 when combining with an emotional engine"

[0753] (Claim 1)

[0754] Means of collecting personal information,

[0755] A means of performing analysis based on the personal information and generating advice tailored to individual needs based on emotional state,

[0756] A means of adjusting the advice using natural language processing technology and providing it to the user,

[0757] A system that includes this.

[0758] (Claim 2)

[0759] The system according to claim 1, comprising means for collecting personal information using voice and visual sensors.

[0760] (Claim 3)

[0761] The system according to claim 1, comprising means for adjusting advice generated based on an analyzed emotional state to suit the user's specific situation. [Explanation of Symbols]

[0762] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. Means of collecting personal information, A means of performing analysis based on the personal information and generating advice based on individual needs, Means for providing such advice to the user, A system that includes this.

2. The system according to claim 1, comprising means for collecting personal information using voice and sensors.

3. The system according to claim 1, comprising means for converting the generated advice into a format understandable to the user using natural language processing technology and providing it to the user.