system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A data processing system analyzes user input data to provide personalized coaching, addressing the challenge of costly and non-tailored feedback by offering continuous, efficient, and cost-effective guidance for business skill improvement.

JP2026096659APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

AI Technical Summary

Technical Problem

Modern businesspersons face challenges in receiving personalized and cost-effective feedback and coaching for self-growth and skill improvement, as conventional methods are often costly and unable to provide tailored guidance.

Method used

A system that uses analytical tools to analyze user input data, including voice, image, and video, to generate personalized feedback and monitor progress, updating user profiles for continuous coaching based on real-time data analysis.

Benefits of technology

Enables efficient, personalized coaching at a reasonable cost, providing users with continuous feedback and support for their goals and challenges, enhancing their business skills and performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096659000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] An analytical tool for analyzing information about goals and challenges received from users, A profile adjustment means for updating the user's personality and preference profile based on the analyzed information, A generation means for generating personalized feedback and advice based on an updated profile, A means for providing the generated feedback to the terminal, A monitoring system for monitoring user progress based on the provided feedback and analyzing new data, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] Modern businesspersons find it difficult to receive appropriate feedback and coaching while aiming for self-growth and skill improvement. In particular, conventional methods are costly and often unable to receive personalized guidance tailored to individual needs. Therefore, there is a need for a low-cost system that can efficiently and objectively support users in achieving their goals and monitor progress in real time.

Means for Solving the Problems

[0005] By using analytical tools to analyze goal and challenge information received from users, the system generates feedback tailored to user needs. Furthermore, based on the analysis results, it updates the user's personality and preference profile, providing personalized advice. The generated feedback is then delivered to the user's device, and daily progress is monitored. This system analyzes new data as needed, ensuring that coaching is always up-to-date. This system processes diverse information using voice, image, and video data recognition to comprehensively evaluate and support user performance.

[0006] A "user" refers to a business person who uses information systems to improve their own performance.

[0007] "Goals and challenges" refer to the areas of skill development or improvement that users want to achieve.

[0008] "Analysis tools" refer to mechanisms for analyzing information received from users and executing a process to generate feedback and advice based on that information.

[0009] "Profile adjustment means" refers to methods and techniques for updating a user's characteristics based on their personality and preferences.

[0010] "Generation method" refers to a function that creates optimal feedback and advice for the user based on the analyzed data.

[0011] "Delivery method" refers to a mechanism that accurately transmits generated feedback and advice to the user's device and makes it available for use.

[0012] "Monitoring methods" refer to methods for continuously monitoring user activity and analyzing progress and new data.

[0013] "Speech recognition means" refers to technology that converts voice data provided by a user into text and then formats that text into an analyzable format.

[0014] "Image recognition means" refers to technology that analyzes image and video data sent by users and extracts important features and information from them. [Brief explanation of the drawing]

[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13]It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when the emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when the emotion engine is combined.

Mode for Carrying Out the Invention

[0016] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0017] First, the terms used in the following description will be explained.

[0018] In the following embodiments, a processor with a reference numeral (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of a plurality of arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of a plurality of types of arithmetic units. Examples of the arithmetic unit include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0019] In the following embodiments, a RAM (Random Access Memory) with a reference numeral is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0020] In the following embodiments, a storage with a reference numeral is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of the non-volatile storage device include a flash memory (SSD (Solid State Drive)), a magnetic disk (e.g., a hard disk), or a magnetic tape, etc.

[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0023] [First Embodiment]

[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0036] This invention provides a system that offers a personalized coaching application using multimodal data to improve users' business performance. This system deeply understands the user's goals and current challenges and provides optimal feedback in real time based on that understanding.

[0037] Users input their goals and current challenges through a dedicated terminal application. The terminal accepts not only text but also multimodal data such as voice memos, images, and video data. This information is transmitted to a server via the internet.

[0038] On the server, an AI-powered analysis platform processes the data. First, text data is analyzed using natural language processing techniques to analyze the user's goals and challenges and extract the core information. Audio data is transcribed using speech recognition technology, and its content is further analyzed. For image and video data, computer vision technology is used to evaluate the user's behavior patterns and situations.

[0039] Based on the analysis results, the server generates feedback in a communication style best suited to the user's personality and preferences. This feedback includes specific advice and suggestions for improvement to help the user achieve their goals. The server then sends this feedback to the terminal for the user to review.

[0040] For example, if a user enters "I want to improve my leadership skills," the AI analyzes past presentation videos of the user and points out areas for improvement in their speaking style and body language. The server also periodically monitors the user's progress and updates the coaching content whenever new data is submitted.

[0041] This system allows users to receive personalized feedback 24 / 7, 365 days a year, enabling efficient self-improvement. This approach provides individualized advice at a reasonable cost, strongly supporting users' career advancement and improvement of business skills.

[0042] The following describes the processing flow.

[0043] Step 1:

[0044] Users input their goals and challenges through a dedicated terminal application. This can include not only text data but also multimodal data such as voice memos, images, and videos. The terminal collects this data and prepares it for transmission.

[0045] Step 2:

[0046] The terminal sends the collected data to the server. The server first integrates the received data and prepares it so that each data type can be properly parsed.

[0047] Step 3:

[0048] The server analyzes text data using natural language processing (NLP) techniques to extract keywords and context related to the goals and challenges set by the user. This process aims to accurately understand the user's intent.

[0049] Step 4:

[0050] The server converts the audio data into text using speech recognition technology and then performs analysis based on the results. It identifies how the audio content relates to the user's goals and objectives and extracts the necessary information.

[0051] Step 5:

[0052] For graphical data (images and videos), the server uses computer vision technology to analyze the user's visual behavior and facial expressions. This data is used to understand the user's performance and areas for improvement.

[0053] Step 6:

[0054] The server integrates the analysis results and updates the user's profile based on their personality and preferences. This makes the feedback generated by the system more personalized and optimized for the user.

[0055] Step 7:

[0056] The feedback generated by the server includes specific improvement suggestions for the user. For example, it may include specific steps to enhance eye contact during presentations or improve leadership skills.

[0057] Step 8:

[0058] The device receives feedback from the server and displays it through the user interface. The user then uses this feedback to work on self-improvement.

[0059] Step 9:

[0060] Whenever new user activity data is generated, the device sends it to the server. The server analyzes this new data and uses it to generate new progress evaluations and feedback. This ensures that users always receive appropriate and continuous coaching.

[0061] (Example 1)

[0062] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0063] In today's business environment, individuals need personalized feedback and guidance to continuously improve their performance. However, traditional methods have struggled to provide specific advice tailored to individual characteristics and preferences in real time. Furthermore, there has been a lack of systems that can comprehensively analyze various forms of data (voice, images, text, etc.) to effectively evaluate user behavior and progress.

[0064] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0065] In this invention, the server includes information analysis means, data adjustment means, generation means, information transmission means, monitoring means, and data analysis means. This enables real-time personalized feedback based on user characteristics, and by integrating and analyzing diverse data formats, user behavior evaluation and progress monitoring can be performed efficiently.

[0066] An "information analysis tool" is a system that has the function of analyzing information about goals and challenges received from users.

[0067] A "data adjustment mechanism" is a system that has the function of updating the user's characteristics and preference profile based on the analyzed information.

[0068] A "generation mechanism" is a system that has the function of generating personalized suggestions and guidance based on an updated profile.

[0069] An "information transmission means" is a mechanism that has the function of providing the generated proposal to a terminal.

[0070] A "monitoring system" is a mechanism that has the functionality to monitor the user's progress based on the provided suggestions and to analyze new information.

[0071] A "data analysis tool" is a system that has the function of evaluating user behavior patterns and environments using multimodal data.

[0072] A "voice conversion means" is a system that converts a user's voice data into text and uses that textual information to analyze the content of their speech.

[0073] A "video analysis system" is a mechanism that analyzes image and video information received from a user and uses the analysis results to evaluate the user's activity.

[0074] This invention is a personalized coaching system aimed at improving user performance. The system includes a series of processes that receive and analyze information regarding the user's goals and challenges, and provide personalized feedback.

[0075] Users input their goals and challenges using a dedicated terminal application. This terminal has the capability to accept multimodal data such as text, audio, images, and videos, and can flexibly handle various data input formats. For example, a user who wants to "improve their leadership skills" can upload past presentation videos.

[0076] The device sends information collected from the user to the server via the internet. This information is encrypted using a secure protocol before reaching the server.

[0077] The server passes the received data to the AI analysis platform. Text data is analyzed using natural language processing (NLP) to extract user intent and emotions. Audio data is converted to text through speech recognition technology, and its content is further analyzed. Image and video data is analyzed using computer vision technology to evaluate user behavior patterns and situations.

[0078] Based on the analysis results, the server uses a generative AI model to generate personalized suggestions in a format best suited to the user's characteristics. These suggestions include specific advice aimed at achieving the user's set goals. The generated suggestions are sent to the terminal, where the user can access and review them.

[0079] For example, consider a case where a user enters "I want to improve my leadership skills." In this case, the AI analyzes the uploaded presentation video and provides feedback pointing out areas for improvement in speaking style and gestures.

[0080] An example of a prompt is, "I want to improve my leadership skills. Please analyze my past presentation videos and tell me what I can improve."

[0081] This system provides users with appropriate feedback 24 / 7, enabling efficient self-improvement.

[0082] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0083] Step 1:

[0084] Users create input data via a dedicated terminal application. This includes entering goals and tasks in text, recording voice memos, and uploading images and videos. For example, a user might input the text "I want to improve my leadership skills" along with a related video. The entered data is collected on the terminal and sent to the server using a reliable transmission protocol.

[0085] Step 2:

[0086] The server passes the received data to the analysis platform and begins data processing. Text data is analyzed by a natural language processing (NLP) engine to analyze and extract the user's intentions and emotions. The output is key keywords and context related to the user's goals. Audio data is transcribed into text through a speech recognition system, and that text is further analyzed. The output is the content of the audio and the results of its analysis.

[0087] Step 3:

[0088] The server processes received image and video data using computer vision algorithms. It analyzes user behavior patterns, location information, and eye movements to assess the user's current state. For example, in the case of a presentation video, the output might include the frequency of gestures and eye movements, along with areas for improvement.

[0089] Step 4:

[0090] The server uses a generative AI model to generate personalized exercise suggestions based on data analysis. This generation process considers the user's profile, analysis results, and historical data trends to provide optimal feedback. The output consists of a file containing specific advice and improvement suggestions for the user.

[0091] Step 5:

[0092] The server sends the generated feedback to the device and presents it in a format that is easy for the user to understand. The device displays the received feedback and notifies the user. This allows the user to review the feedback and use it to improve their daily actions.

[0093] Step 6:

[0094] Users work on self-improvement based on the feedback they receive. New data is periodically sent from their devices to the server so that progress can be continuously monitored. The server re-analyzes this new data and updates the feedback. This ensures that the most up-to-date and relevant advice is always provided.

[0095] (Application Example 1)

[0096] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0097] In today's business environment, there is a demand for timely and efficient personalized feedback and advice. However, current technology makes it difficult to provide real-time feedback based on user personality and preferences, hindering effective self-improvement. Furthermore, obtaining appropriate skill-building advice in daily life is also challenging.

[0098] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0099] In this invention, the server includes analysis means for analyzing information on goals and tasks received from the user; profile update means for updating the user's personality and preference profile based on the analyzed information; generation means for generating personalized feedback and advice based on the updated profile; and activity observation means for observing the user's daily activities through a consumer robot and providing advice in real time. This enables the user to effectively improve their business skills even in their daily life.

[0100] "Analysis means" refers to methods and devices for processing information received from users and understanding goals and challenges.

[0101] A "profile update means" is a method or device for modifying and updating data related to a user's personality and preferences based on analyzed information.

[0102] "Generative means" refers to methods and devices for generating personalized feedback and advice based on updated profiles.

[0103] A "progress tracking method" refers to a technique or device for monitoring user progress based on generated feedback and analyzing the associated data.

[0104] "Activity observation means" refers to methods and devices that use consumer robots to record a user's daily actions and provide real-time advice based on the observation results.

[0105] The system for realizing this invention utilizes consumer robots that users use on a daily basis and has a mechanism to provide feedback on goals and tasks that the user proactively presents. The user's activities are recorded by the robot, using sensors such as cameras and microphones. The robot transmits the collected data to a cloud server.

[0106] On the server, analysis tools are used to first convert audio data into text using speech recognition software (e.g., Google® Speech-to-Text API). Next, the text data and image / video data are analyzed using natural language processing software (e.g., spaCy or Transformers) and image analysis software (e.g., OpenCV or TENSORFLOW®). Based on this analysis, the user's personality and preference profile is updated by a profile update tool.

[0107] The generation method uses a generative AI model to generate personalized feedback and advice tailored to the user's profile. The generated feedback is then further monitored by the progress tracking method to track the user's performance and growth, enabling further analysis as needed.

[0108] For example, if a user is cooking, the robot observes their movements and provides real-time advice to improve their cooking skills. An example of a prompt might be, "Analyze the user's body language and speech during cooking to generate feedback for improving their cooking skills." This allows users to effectively improve their skills from the comfort of their homes, and the system is particularly useful for improving business skills.

[0109] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0110] Step 1:

[0111] The terminal (robot) records the user's activities as audio and image data. This is done using microphones and cameras mounted on the robot. The input is the user's real-time audio and video, and the output after processing is recorded audio data files and video data files.

[0112] Step 2:

[0113] The terminal transmits recorded data to a server via the internet. Audio and image data files are transmitted. The input is an audio / video data file, and the output is a similar data file stored on the server.

[0114] Step 3:

[0115] The server converts the audio data into text for natural language processing. This process utilizes speech recognition technologies such as the "Google Speech-to-Text API." The input is an audio data file, and the output, as a result of the processing, is obtained as text data.

[0116] Step 4:

[0117] The server performs text analysis and processes the data to understand the user's goals and challenges. Natural language processing software such as "spaCy" and "Transformers" are used for this process. The input is text data, and the output is analyzed goal and challenge data.

[0118] Step 5:

[0119] The server analyzes video data to extract user behavior and behavioral patterns. This analysis uses image recognition technology based on "OpenCV" and "TensorFlow". The input is a video data file, and the output is behavioral pattern data.

[0120] Step 6:

[0121] The server updates the user's profile based on the analyzed text and behavioral pattern data. This adjusts the profile based on the input, and the output is the updated profile data.

[0122] Step 7:

[0123] The server uses a generative AI model to generate personalized feedback based on the updated profile. The input is the updated profile data, and the output is the text of the generated feedback.

[0124] Step 8:

[0125] The server sends the generated feedback data back to the terminal. This allows the user to receive feedback through the terminal. The input is the generated feedback data, and the output is the feedback displayed or audibly presented on the terminal.

[0126] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0127] This invention is an advanced coaching system aimed at improving user performance in business settings. This system not only receives and analyzes information about the user's goals and challenges, but also recognizes the user's emotional state through an emotion engine and provides personalized feedback.

[0128] Users can record their mood for the day or their thoughts on specific tasks using a dedicated terminal application. This information is entered as text data, voice memos, and, if necessary, as images or videos, and is all sent to the server.

[0129] The server comprehensively processes the received data. Text and audio data are analyzed using natural language processing and speech recognition technologies to extract specific information related to the user's problem. Furthermore, image and video data are analyzed using computer vision technology to evaluate the user's visual behavior and situation.

[0130] In addition, the emotion engine evaluates the user's emotions in real time from image and audio data. It performs facial expression analysis and determines the type and intensity of emotions from voice tone and word choice. This information is added to the user profile that has been set up so that feedback is generated that is tailored to the user's current emotional state.

[0131] For example, if a user who wants to improve their leadership skills feels anxious about giving a presentation, the system will provide advice to alleviate that anxiety. Specifically, if the system determines that the user's stress level is high, the emotional engine will include relaxation techniques and specific preparation points in its advice.

[0132] Ultimately, the device receives feedback from the server and presents it to the user. Based on this information, the user develops an action plan, puts it into action, and records their progress on the device as it progresses. This allows the server to analyze the new data and continue to adjust the coaching accordingly. In this way, the present invention realizes more effective support for self-improvement that takes the user's emotions into consideration.

[0133] The following describes the processing flow.

[0134] Step 1:

[0135] Users use a terminal application to input their goals, challenges, and emotional status. This can include not only text but also spoken words expressing their feelings and videos of their facial expressions.

[0136] Step 2:

[0137] The terminal sends the collected data to the server. The server receives the various types of data and distributes them to the appropriate processing module according to their type.

[0138] Step 3:

[0139] The server analyzes text data using natural language processing techniques to extract important keywords and contexts related to the user's set tasks and goals.

[0140] Step 4:

[0141] The server converts the audio data into text using speech recognition technology, and then analyzes the content of the speech based on the results. This makes it possible to utilize data that is directly relevant to the user's problems.

[0142] Step 5:

[0143] The server analyzes image and video data using computer vision technology to extract user visual indicators and behavioral patterns. This allows for an evaluation of the user's actual performance.

[0144] Step 6:

[0145] The emotion engine identifies the user's emotional state from image and audio data. It reads emotions from facial expressions and simultaneously analyzes emotions from voice tone and rhythm, sending the results to the server.

[0146] Step 7:

[0147] The server integrates all the analytical data and updates the user's profile to reflect their personality and emotions. This result is incorporated into the feedback, making it more personalized.

[0148] Step 8:

[0149] The server sends the generated feedback and advice to the device. The device receives this feedback and displays it to the user.

[0150] Step 9:

[0151] Users accept feedback and use it to improve their actions and situations. Newly generated data (such as new audio recordings or videos) is sent again from the device to the server, and the cycle continues.

[0152] Step 10:

[0153] The server analyzes new data, continuously evaluates progress, and updates feedback as needed. This process provides users with continuous support and opportunities for improvement.

[0154] (Example 2)

[0155] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0156] In business settings, there is a growing need to provide personalized feedback tailored to the user's emotions and circumstances in order to improve user performance. However, conventional systems have struggled to adequately analyze user input and accurately assess emotional states, making it difficult to effectively generate personalized feedback.

[0157] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0158] In this invention, the server is an information processing device for analyzing information received from a user, and includes means for analyzing the user's input using natural language processing technology and speech recognition technology; emotion estimation means for evaluating the user's emotional state based on the analyzed information; and a generation device for generating personalized feedback and advice based on the emotional state evaluated by the emotion estimation means and an updated profile. This makes it possible to provide highly personalized feedback that is tailored to the user's emotions and situation.

[0159] An "information processing device" is a device that has the function of analyzing various data received from a user and processing it based on that analysis.

[0160] "Natural language processing technology" refers to the technology that enables computers to understand, interpret, and generate language that humans use naturally.

[0161] "Speech recognition technology" is a technology that analyzes audio data and converts it into text data.

[0162] An "emotion estimation method" is a means of identifying a user's emotional state based on user input information and evaluating its intensity and type.

[0163] A "generation device" is a device that uses existing data and algorithms to generate information or advice tailored to a specific purpose.

[0164] "Feedback" refers to information provided for the purpose of improvement and advice, based on analysis results regarding user behavior and circumstances.

[0165] A "profile" is a dataset created based on a user's characteristics and past behavior, and it provides the foundational information necessary to enable personalized responses.

[0166] This invention is an advanced coaching system designed to improve users' business performance. This system comprehensively analyzes user input data and provides personalized feedback that takes into account their emotional state. Specifically, it begins with the user using a device with a dedicated application installed to input their daily mood and challenges. This input is in the form of text, audio, images, and video data, which are then transmitted from the device to the server.

[0167] The server, as an information processing device, performs various data analyses. Text and audio data are analyzed using natural language processing and speech recognition technologies. "Google Cloud Natural Language API" and "Amazon Transcribe" are available for this purpose. Furthermore, for image and video data, computer vision software such as "OpenCV" and "Amazon Rekognition" are used to analyze the user's visual information and understand the situation.

[0168] Based on the analysis results, the server uses a generative AI model to create prompts and evaluates the user's emotional state using emotion estimation tools. For example, if a user is feeling anxious about a presentation, the server can generate a specific question as a prompt, such as "How can I overcome my anxiety about the presentation?" This prompt is then input into the generative AI model, which provides optimized feedback.

[0169] Feedback is sent to the device and displayed to the user. Users utilize this feedback to develop and execute action plans, thereby accelerating goal achievement. The advantage of this system lies in its ability to provide real-time, appropriate feedback tailored to each user's individual circumstances. This allows users to efficiently pursue self-improvement.

[0170] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0171] Step 1:

[0172] Users input emotional data and information about their challenges using a dedicated terminal application.

[0173] The input information can take the form of text, voice memos, images, videos, and other formats. This data is temporarily stored in the device's storage and then prepared for transmission to the server.

[0174] Step 2:

[0175] The device sends information obtained from the user to the server.

[0176] In this process, the device encrypts the data via Wi-Fi or a mobile network and transfers it to the server's cloud storage for receiving data. Once the transmission is complete, the device notifies the user of the successful transmission.

[0177] Step 3:

[0178] The server processes the received data in order to analyze it.

[0179] Natural language processing techniques are used to analyze text and audio data, extracting keywords and emotions. Audio data is converted to text using speech recognition technology. Computer vision technology analyzes image and video data to evaluate user behavior and situations. The analysis results are stored in a database.

[0180] Step 4:

[0181] The server evaluates the user's emotional state based on the analysis results.

[0182] Using emotion estimation tools, the type and intensity of emotions are identified based on information extracted from user input data. This evaluation is added to the user's profile and serves as the basis for generating feedback.

[0183] Step 5:

[0184] The server uses a generated AI model to create prompt messages and generate personalized feedback.

[0185] Based on emotional states and goals, the server prompts an AI model with a prompt such as, "Please tell me how to overcome presentation anxiety," and generates optimal advice. The results are stored as feedback data.

[0186] Step 6:

[0187] The server generates feedback, which is then sent to the terminal and presented to the user.

[0188] The device displays feedback received from the server via notifications or within the app, allowing the user to review it and develop an action plan. Feedback viewing activity on the device is recorded as a log.

[0189] Step 7:

[0190] We plan and execute actions based on the feedback provided by users.

[0191] By recording the results and progress of actions on the device again, the server receives new data, and the analysis and feedback process is repeated. In this way, the user's skills and emotional state are continuously improved.

[0192] (Application Example 2)

[0193] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0194] In today's business environment, there is a problem in that approaches to individual users' goals and challenges are uniform, and appropriate support is not provided according to individual circumstances and emotional states. Furthermore, there is a problem in that effective self-improvement cannot be achieved because there are insufficient mechanisms to understand users' emotional states and customize feedback based on them.

[0195] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0196] In this invention, the server includes analysis means for analyzing information about goals and tasks received from the user, profile update means for updating the user's characteristics and preference profile based on the analyzed information, and emotion evaluation means for evaluating the user's emotional state in real time from voice and visual data and generating emotion-appropriate feedback. This makes it possible to provide personalized feedback that is tailored to the user's characteristics and emotions.

[0197] "Analysis means" refers to a means of performing a process to analyze information received from a user and understand its content.

[0198] A "profile update method" is a means of revising data related to user characteristics and preferences based on analysis results and keeping it up-to-date.

[0199] "Generation means" refers to the means of generating personalized feedback and advice for users by utilizing updated profiles.

[0200] A "feedback provision method" is a means of transmitting the generated feedback to the user's device so that the user can review it.

[0201] "Monitoring measures" refer to methods for observing user performance based on provided feedback and for re-analyzing newly obtained information.

[0202] An "emotion evaluation tool" is a means of using the user's voice and visual information to determine the user's emotional state in real time and reflect that in the feedback.

[0203] This invention realizes an advanced coaching system aimed at improving user performance in business settings. This system consists of a server, a user terminal, and an environment for receiving data input from the user.

[0204] Server Functions

[0205] The server uses natural language processing and speech recognition technologies as analytical tools to analyze information about goals and challenges received from users. Specifically, it uses Google Cloud Speech-to-Text to convert speech data into text and analyzes it with Google Cloud Natural Language. This provides data for updating the user's characteristics and preference profile. The updated information is managed by a profile update tool, and personalized feedback and advice are generated using a generation tool.

[0206] The server also includes emotion assessment tools to evaluate the user's emotional state. Using the Affectiva SDK and computer vision technologies with OpenCV and TensorFlow, it analyzes emotions in real time from audio and visual data and incorporates them into the feedback.

[0207] Terminal role

[0208] The device functions as a means of providing feedback, receiving feedback from the server and presenting it to the user. The user can review the feedback received through the device and incorporate it into their daily actions.

[0209] User interaction

[0210] Users provide the system with their mood and goals using a tablet or voice input device. This allows them to receive coaching optimized for their current state. For example, a user who needs to relax after a busy day at work might receive appropriate advice such as, "Take a deep breath and relax. Also, make a quick plan for tomorrow and get some rest early."

[0211] Utilization of Generative AI Models

[0212] The system can use a generative AI model to dynamically generate feedback that matches the user's emotions and characteristics. A concrete example of a prompt would be, "Please suggest ways to relax when the user is tired."

[0213] In this way, the system provides optimal coaching tailored to the user's characteristics and emotions, supporting the improvement of the user's work performance.

[0214] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0215] Step 1:

[0216] Users input voice, text, and image information using their devices. Users use voice input or tablets to record goals, challenges, and current emotional states, and this data is sent from the device to the server.

[0217] Step 2:

[0218] The server converts the received audio data into text using the Google Cloud Speech-to-Text service. Converting the audio data into text makes it easier to understand for subsequent natural language processing.

[0219] Step 3:

[0220] The server analyzes the text data using Google Cloud Natural Language to extract specific information related to the user's goals and challenges. The output here becomes the data needed to update the user's profile.

[0221] Step 4:

[0222] The server uses OpenCV and TensorFlow to analyze image and video data. Image recognition detects the user's facial expressions and situation, and the Affectiva SDK evaluates the emotion.

[0223] Step 5:

[0224] Based on the evaluated sentiment data, the server uses a generative AI model to generate user-optimized feedback. Information extracted from profile updates is combined with sentiment evaluations to aid in creating descriptive text.

[0225] Step 6:

[0226] The generated feedback is sent to the device through the feedback provision mechanism. The user receives the feedback on the device and reflects it in their actions.

[0227] Step 7:

[0228] The user acts based on the feedback received on their device, and their progress and any new inputs are recorded as they occur. This allows the system to obtain new data for the next cycle.

[0229] Step 8:

[0230] The server analyzes the newly acquired data as a monitoring tool and prepares to make the next coaching session with the user more effective.

[0231] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0232] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0233] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0234] [Second Embodiment]

[0235] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0236] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0237] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0238] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0239] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0240] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0241] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0242] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0243] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0244] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0245] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0246] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0247] This invention provides a system that offers a personalized coaching application using multimodal data to improve users' business performance. This system deeply understands the user's goals and current challenges and provides optimal feedback in real time based on that understanding.

[0248] Users input their goals and current challenges through a dedicated terminal application. The terminal accepts not only text but also multimodal data such as voice memos, images, and video data. This information is transmitted to a server via the internet.

[0249] On the server, an AI-powered analysis platform processes the data. First, text data is analyzed using natural language processing techniques to analyze the user's goals and challenges and extract the core information. Audio data is transcribed using speech recognition technology, and its content is further analyzed. For image and video data, computer vision technology is used to evaluate the user's behavior patterns and situations.

[0250] Based on the analysis results, the server generates feedback in a communication style best suited to the user's personality and preferences. This feedback includes specific advice and suggestions for improvement to help the user achieve their goals. The server then sends this feedback to the terminal for the user to review.

[0251] For example, if a user enters "I want to improve my leadership skills," the AI analyzes past presentation videos of the user and points out areas for improvement in their speaking style and body language. The server also periodically monitors the user's progress and updates the coaching content whenever new data is submitted.

[0252] This system allows users to receive personalized feedback 24 / 7, 365 days a year, enabling efficient self-improvement. This approach provides individualized advice at a reasonable cost, strongly supporting users' career advancement and improvement of business skills.

[0253] The following describes the processing flow.

[0254] Step 1:

[0255] Users input their goals and challenges through a dedicated terminal application. This can include not only text data but also multimodal data such as voice memos, images, and videos. The terminal collects this data and prepares it for transmission.

[0256] Step 2:

[0257] The terminal sends the collected data to the server. The server first integrates the received data and prepares it so that each data type can be properly parsed.

[0258] Step 3:

[0259] The server analyzes text data using natural language processing (NLP) techniques to extract keywords and context related to the goals and challenges set by the user. This process aims to accurately understand the user's intent.

[0260] Step 4:

[0261] The server converts the audio data into text using speech recognition technology and then performs analysis based on the results. It identifies how the audio content relates to the user's goals and objectives and extracts the necessary information.

[0262] Step 5:

[0263] For graphical data (images and videos), the server uses computer vision technology to analyze the user's visual behavior and facial expressions. This data is used to understand the user's performance and areas for improvement.

[0264] Step 6:

[0265] The server integrates the analysis results and updates the user's profile based on their personality and preferences. This makes the feedback generated by the system more personalized and optimized for the user.

[0266] Step 7:

[0267] The feedback generated by the server includes specific improvement suggestions for the user. For example, it may include specific steps to enhance eye contact during presentations or improve leadership skills.

[0268] Step 8:

[0269] The device receives feedback from the server and displays it through the user interface. The user then uses this feedback to work on self-improvement.

[0270] Step 9:

[0271] Whenever new user activity data is generated, the device sends it to the server. The server analyzes this new data and uses it to generate new progress evaluations and feedback. This ensures that users always receive appropriate and continuous coaching.

[0272] (Example 1)

[0273] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0274] In today's business environment, individuals need personalized feedback and guidance to continuously improve their performance. However, traditional methods have struggled to provide specific advice tailored to individual characteristics and preferences in real time. Furthermore, there has been a lack of systems that can comprehensively analyze various forms of data (voice, images, text, etc.) to effectively evaluate user behavior and progress.

[0275] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0276] In this invention, the server includes information analysis means, data adjustment means, generation means, information transmission means, monitoring means, and data analysis means. This enables real-time personalized feedback based on user characteristics, and by integrating and analyzing diverse data formats, user behavior evaluation and progress monitoring can be performed efficiently.

[0277] The "information analysis means" is a mechanism having a function of analyzing information on goals and issues received from a user.

[0278] The "data adjustment means" is a mechanism having a function of updating user characteristics and preference profiles based on the analyzed information.

[0279] The "generation means" is a mechanism having a function of generating individualized proposals and guidance based on the updated profiles.

[0280] The "information transmission means" is a mechanism having a function of providing the generated proposals to a terminal.

[0281] The "monitoring means" is a mechanism having a function of monitoring the progress of a user based on the provided proposals and analyzing new information.

[0282] The "data analysis means" is a mechanism having a function of evaluating a user's behavior pattern and environment using multimodal data.

[0283] The "voice conversion means" is a mechanism having a function of converting user voice data into text and analyzing the speech content using the texturized information.

[0284] The "video analysis means" is a mechanism having a function of analyzing image information and video information received from a user and using the analysis results for evaluating the user's activities.

[0285] The present invention is an individualized coaching system aimed at improving a user's performance. This system includes a series of processes of receiving, analyzing information on a user's goals and issues, and providing individualized feedback.

[0286] The user inputs their goals and challenges using a dedicated terminal application. This terminal has the function of accepting multi-modal data such as text, voice, images, videos, etc., and can flexibly respond to various forms of data input. For example, the user can upload a past presentation video if they want to improve their leadership skills.

[0287] The terminal sends the information collected from the user to the server via the Internet. This information is encrypted using a secure protocol and reaches the server.

[0288] The server passes the received data to the AI analysis platform. Text data is analyzed using natural language processing technology (Natural Language Processing, NLP) to extract the user's intentions and emotions. Voice data is converted into text through voice recognition technology, and its content is further analyzed. Image and video data are analyzed using computer vision technology to evaluate the user's behavior patterns and situations.

[0289] Based on the analysis results, the server uses a generated AI model to generate individualized proposals in the most suitable format for the user's characteristics. This proposal includes specific advice for achieving the goals set by the user. The generated proposal is sent to the terminal, and the user can access and view it.

[0290] As an example, consider the case where the user inputs "want to polish leadership skills". At this time, the AI analyzes the uploaded presentation video and provides feedback pointing out areas for improvement in speaking style and gestures.

[0291] An example of a prompt sentence is "I want to polish my leadership skills. Please analyze my past presentation videos and tell me the areas for improvement."

[0292] This system provides users with appropriate feedback 24 / 7, enabling efficient self-improvement.

[0293] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0294] Step 1:

[0295] Users create input data via a dedicated terminal application. This includes entering goals and tasks in text, recording voice memos, and uploading images and videos. For example, a user might input the text "I want to improve my leadership skills" along with a related video. The entered data is collected on the terminal and sent to the server using a reliable transmission protocol.

[0296] Step 2:

[0297] The server passes the received data to the analysis platform and begins data processing. Text data is analyzed by a natural language processing (NLP) engine to analyze and extract the user's intentions and emotions. The output is key keywords and context related to the user's goals. Audio data is transcribed into text through a speech recognition system, and that text is further analyzed. The output is the content of the audio and the results of its analysis.

[0298] Step 3:

[0299] The server processes received image and video data using computer vision algorithms. It analyzes user behavior patterns, location information, and eye movements to assess the user's current state. For example, in the case of a presentation video, the output might include the frequency of gestures and eye movements, along with areas for improvement.

[0300] Step 4:

[0301] The server uses a generative AI model to generate individualized exercise proposals based on data analysis. In this generation process, the user's profile, analysis results, and past data trends are considered to prepare optimal feedback for the user. As output, specific advice and improvement measures for the user are constructed in a file format.

[0302] Step 5:

[0303] The server sends the generated feedback to the terminal and presents it in a form that is easy for the user to view. The terminal displays the received feedback and notifies the user. This allows the user to view the feedback and apply it to their daily activities.

[0304] Step 6:

[0305] The user works on self-improvement based on the provided feedback. New data is periodically sent from the terminal to the server so that the progress can be continuously monitored. The server re-analyzes this new data and updates the feedback. This ensures that always up-to-date and appropriate advice is continuously provided.

[0306] (Application Example 1)

[0307] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0308] In the modern business environment, it is required to receive individualized feedback and advice in a timely and efficient manner. However, current technologies have the problem that it is difficult to provide real-time feedback based on the user's personality and preferences, and effective self-improvement cannot be achieved. Furthermore, it is also difficult to obtain appropriate advice for skill improvement in daily life.

[0309] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0310] In this invention, the server includes analysis means for analyzing information on goals and tasks received from the user; profile update means for updating the user's personality and preference profile based on the analyzed information; generation means for generating personalized feedback and advice based on the updated profile; and activity observation means for observing the user's daily activities through a consumer robot and providing advice in real time. This enables the user to effectively improve their business skills even in their daily life.

[0311] "Analysis means" refers to methods and devices for processing information received from users and understanding goals and challenges.

[0312] A "profile update means" is a method or device for modifying and updating data related to a user's personality and preferences based on analyzed information.

[0313] "Generative means" refers to methods and devices for generating personalized feedback and advice based on updated profiles.

[0314] A "progress tracking method" refers to a technique or device for monitoring user progress based on generated feedback and analyzing the associated data.

[0315] "Activity observation means" refers to methods and devices that use consumer robots to record a user's daily actions and provide real-time advice based on the observation results.

[0316] The system for realizing this invention utilizes consumer robots that users use on a daily basis and has a mechanism to provide feedback on goals and tasks that the user proactively presents. The user's activities are recorded by the robot, using sensors such as cameras and microphones. The robot transmits the collected data to a cloud server.

[0317] On the server, analysis tools are used to first convert audio data into text using speech recognition software (e.g., "Google Speech-to-Text API"). Next, the text data and image / video data are analyzed using natural language processing software (e.g., "spaCy" or "Transformers") and image analysis software (e.g., "OpenCV" or "TensorFlow"). Based on this analysis, the user's personality and preference profile is updated by a profile update tool.

[0318] The generation method uses a generative AI model to generate personalized feedback and advice tailored to the user's profile. The generated feedback is then further monitored by the progress tracking method to track the user's performance and growth, enabling further analysis as needed.

[0319] For example, if a user is cooking, the robot observes their movements and provides real-time advice to improve their cooking skills. An example of a prompt might be, "Analyze the user's body language and speech during cooking to generate feedback for improving their cooking skills." This allows users to effectively improve their skills from the comfort of their homes, and the system is particularly useful for improving business skills.

[0320] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0321] Step 1:

[0322] The terminal (robot) records the user's activities as audio and image data. This is done using microphones and cameras mounted on the robot. The input is the user's real-time audio and video, and the output after processing is recorded audio data files and video data files.

[0323] Step 2:

[0324] The terminal transmits recorded data to a server via the internet. Audio and image data files are transmitted. The input is an audio / video data file, and the output is a similar data file stored on the server.

[0325] Step 3:

[0326] The server converts the audio data into text for natural language processing. This process utilizes speech recognition technologies such as the "Google Speech-to-Text API." The input is an audio data file, and the output, as a result of the processing, is obtained as text data.

[0327] Step 4:

[0328] The server performs text analysis and processes the data to understand the user's goals and challenges. Natural language processing software such as "spaCy" and "Transformers" are used for this process. The input is text data, and the output is analyzed goal and challenge data.

[0329] Step 5:

[0330] The server analyzes video data to extract user behavior and behavioral patterns. This analysis uses image recognition technology based on "OpenCV" and "TensorFlow". The input is a video data file, and the output is behavioral pattern data.

[0331] Step 6:

[0332] The server updates the user's profile based on the analyzed text and behavioral pattern data. This adjusts the profile based on the input, and the output is the updated profile data.

[0333] Step 7:

[0334] The server uses a generative AI model to generate personalized feedback based on the updated profile. The input is the updated profile data, and the output is the text of the generated feedback.

[0335] Step 8:

[0336] The server sends the generated feedback data back to the terminal. This allows the user to receive feedback through the terminal. The input is the generated feedback data, and the output is the feedback displayed or audibly presented on the terminal.

[0337] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0338] This invention is an advanced coaching system aimed at improving user performance in business settings. This system not only receives and analyzes information about the user's goals and challenges, but also recognizes the user's emotional state through an emotion engine and provides personalized feedback.

[0339] Users can record their mood for the day or their thoughts on specific tasks using a dedicated terminal application. This information is entered as text data, voice memos, and, if necessary, as images or videos, and is all sent to the server.

[0340] The server comprehensively processes the received data. Text and audio data are analyzed using natural language processing and speech recognition technologies to extract specific information related to the user's problem. Furthermore, image and video data are analyzed using computer vision technology to evaluate the user's visual behavior and situation.

[0341] In addition, the emotion engine evaluates the user's emotions in real time from image and audio data. It performs facial expression analysis and determines the type and intensity of emotions from voice tone and word choice. This information is added to the user profile that has been set up so that feedback is generated that is tailored to the user's current emotional state.

[0342] For example, if a user who wants to improve their leadership skills feels anxious about giving a presentation, the system will provide advice to alleviate that anxiety. Specifically, if the system determines that the user's stress level is high, the emotional engine will include relaxation techniques and specific preparation points in its advice.

[0343] Ultimately, the device receives feedback from the server and presents it to the user. Based on this information, the user develops an action plan, puts it into action, and records their progress on the device as it progresses. This allows the server to analyze the new data and continue to adjust the coaching accordingly. In this way, the present invention realizes more effective support for self-improvement that takes the user's emotions into consideration.

[0344] The following describes the processing flow.

[0345] Step 1:

[0346] Users use a terminal application to input their goals, challenges, and emotional status. This can include not only text but also spoken words expressing their feelings and videos of their facial expressions.

[0347] Step 2:

[0348] The terminal sends the collected data to the server. The server receives the various types of data and distributes them to the appropriate processing module according to their type.

[0349] Step 3:

[0350] The server analyzes text data using natural language processing techniques to extract important keywords and contexts related to the user's set tasks and goals.

[0351] Step 4:

[0352] The server converts the audio data into text using speech recognition technology, and then analyzes the content of the speech based on the results. This makes it possible to utilize data that is directly relevant to the user's problems.

[0353] Step 5:

[0354] The server analyzes image and video data using computer vision technology to extract user visual indicators and behavioral patterns. This allows for an evaluation of the user's actual performance.

[0355] Step 6:

[0356] The emotion engine identifies the user's emotional state from image and audio data. It reads emotions from facial expressions and simultaneously analyzes emotions from voice tone and rhythm, sending the results to the server.

[0357] Step 7:

[0358] The server integrates all the analytical data and updates the user's profile to reflect their personality and emotions. This result is incorporated into the feedback, making it more personalized.

[0359] Step 8:

[0360] The server sends the generated feedback and advice to the device. The device receives this feedback and displays it to the user.

[0361] Step 9:

[0362] Users accept feedback and use it to improve their actions and situations. Newly generated data (such as new audio recordings or videos) is sent again from the device to the server, and the cycle continues.

[0363] Step 10:

[0364] The server analyzes new data, continuously evaluates progress, and updates feedback as needed. This process provides users with continuous support and opportunities for improvement.

[0365] (Example 2)

[0366] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0367] In business settings, there is a growing need to provide personalized feedback tailored to the user's emotions and circumstances in order to improve user performance. However, conventional systems have struggled to adequately analyze user input and accurately assess emotional states, making it difficult to effectively generate personalized feedback.

[0368] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0369] In this invention, the server is an information processing device for analyzing information received from a user, and includes means for analyzing the user's input using natural language processing technology and speech recognition technology; emotion estimation means for evaluating the user's emotional state based on the analyzed information; and a generation device for generating personalized feedback and advice based on the emotional state evaluated by the emotion estimation means and an updated profile. This makes it possible to provide highly personalized feedback that is tailored to the user's emotions and situation.

[0370] An "information processing device" is a device that has the function of analyzing various data received from a user and processing it based on that analysis.

[0371] "Natural language processing technology" refers to the technology that enables computers to understand, interpret, and generate language that humans use naturally.

[0372] "Speech recognition technology" is a technology that analyzes audio data and converts it into text data.

[0373] An "emotion estimation method" is a means of identifying a user's emotional state based on user input information and evaluating its intensity and type.

[0374] A "generation device" is a device that uses existing data and algorithms to generate information or advice tailored to a specific purpose.

[0375] "Feedback" refers to information provided for the purpose of improvement and advice, based on analysis results regarding user behavior and circumstances.

[0376] A "profile" is a dataset created based on a user's characteristics and past behavior, and it provides the foundational information necessary to enable personalized responses.

[0377] This invention is an advanced coaching system designed to improve users' business performance. This system comprehensively analyzes user input data and provides personalized feedback that takes into account their emotional state. Specifically, it begins with the user using a device with a dedicated application installed to input their daily mood and challenges. This input is in the form of text, audio, images, and video data, which are then transmitted from the device to the server.

[0378] The server, as an information processing device, performs various data analyses. Text and audio data are analyzed using natural language processing and speech recognition technologies. "Google Cloud Natural Language API" and "Amazon Transcribe" are available for this purpose. Furthermore, for image and video data, computer vision software such as "OpenCV" and "Amazon Rekognition" are used to analyze the user's visual information and understand the situation.

[0379] Based on the analysis results, the server uses a generative AI model to create prompts and evaluates the user's emotional state using emotion estimation tools. For example, if a user is feeling anxious about a presentation, the server can generate a specific question as a prompt, such as "How can I overcome my anxiety about the presentation?" This prompt is then input into the generative AI model, which provides optimized feedback.

[0380] Feedback is sent to the device and displayed to the user. Users utilize this feedback to develop and execute action plans, thereby accelerating goal achievement. The advantage of this system lies in its ability to provide real-time, appropriate feedback tailored to each user's individual circumstances. This allows users to efficiently pursue self-improvement.

[0381] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0382] Step 1:

[0383] Users input emotional data and information about their challenges using a dedicated terminal application.

[0384] The input information can take the form of text, voice memos, images, videos, and other formats. This data is temporarily stored in the device's storage and then prepared for transmission to the server.

[0385] Step 2:

[0386] The device sends information obtained from the user to the server.

[0387] In this process, the device encrypts the data via Wi-Fi or a mobile network and transfers it to the server's cloud storage for receiving data. Once the transmission is complete, the device notifies the user of the successful transmission.

[0388] Step 3:

[0389] The server processes the received data in order to analyze it.

[0390] Natural language processing techniques are used to analyze text and audio data, extracting keywords and emotions. Audio data is converted to text using speech recognition technology. Computer vision technology analyzes image and video data to evaluate user behavior and situations. The analysis results are stored in a database.

[0391] Step 4:

[0392] The server evaluates the user's emotional state based on the analysis results.

[0393] Using emotion estimation tools, the type and intensity of emotions are identified based on information extracted from user input data. This evaluation is added to the user's profile and serves as the basis for generating feedback.

[0394] Step 5:

[0395] The server uses a generated AI model to create prompt messages and generate personalized feedback.

[0396] Based on emotional states and goals, the server prompts an AI model with a prompt such as, "Please tell me how to overcome presentation anxiety," and generates optimal advice. The results are stored as feedback data.

[0397] Step 6:

[0398] The server generates feedback, which is then sent to the terminal and presented to the user.

[0399] The device displays feedback received from the server via notifications or within the app, allowing the user to review it and develop an action plan. Feedback viewing activity on the device is recorded as a log.

[0400] Step 7:

[0401] We plan and execute actions based on the feedback provided by users.

[0402] By recording the results and progress of actions on the device again, the server receives new data, and the analysis and feedback process is repeated. In this way, the user's skills and emotional state are continuously improved.

[0403] (Application Example 2)

[0404] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0405] In today's business environment, there is a problem in that approaches to individual users' goals and challenges are uniform, and appropriate support is not provided according to individual circumstances and emotional states. Furthermore, there is a problem in that effective self-improvement cannot be achieved because there are insufficient mechanisms to understand users' emotional states and customize feedback based on them.

[0406] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0407] In this invention, the server includes analysis means for analyzing information about goals and tasks received from the user, profile update means for updating the user's characteristics and preference profile based on the analyzed information, and emotion evaluation means for evaluating the user's emotional state in real time from voice and visual data and generating emotion-appropriate feedback. This makes it possible to provide personalized feedback that is tailored to the user's characteristics and emotions.

[0408] "Analysis means" refers to a means of performing a process to analyze information received from a user and understand its content.

[0409] A "profile update method" is a means of revising data related to user characteristics and preferences based on analysis results and keeping it up-to-date.

[0410] "Generation means" refers to the means of generating personalized feedback and advice for users by utilizing updated profiles.

[0411] A "feedback provision method" is a means of transmitting the generated feedback to the user's device so that the user can review it.

[0412] "Monitoring measures" refer to methods for observing user performance based on provided feedback and for re-analyzing newly obtained information.

[0413] An "emotion evaluation tool" is a means of using the user's voice and visual information to determine the user's emotional state in real time and reflect that in the feedback.

[0414] This invention realizes an advanced coaching system aimed at improving user performance in business settings. This system consists of a server, a user terminal, and an environment for receiving data input from the user.

[0415] Server Functions

[0416] The server uses natural language processing and speech recognition technologies as analytical tools to analyze information about goals and challenges received from users. Specifically, it uses Google Cloud Speech-to-Text to convert speech data into text and analyzes it with Google Cloud Natural Language. This provides data for updating the user's characteristics and preference profile. The updated information is managed by a profile update tool, and personalized feedback and advice are generated using a generation tool.

[0417] The server also includes emotion assessment tools to evaluate the user's emotional state. Using the Affectiva SDK and computer vision technologies with OpenCV and TensorFlow, it analyzes emotions in real time from audio and visual data and incorporates them into the feedback.

[0418] Terminal role

[0419] The device functions as a means of providing feedback, receiving feedback from the server and presenting it to the user. The user can review the feedback received through the device and incorporate it into their daily actions.

[0420] User interaction

[0421] Users provide the system with their mood and goals using a tablet or voice input device. This allows them to receive coaching optimized for their current state. For example, a user who needs to relax after a busy day at work might receive appropriate advice such as, "Take a deep breath and relax. Also, make a quick plan for tomorrow and get some rest early."

[0422] Utilization of Generative AI Models

[0423] The system can use a generative AI model to dynamically generate feedback that matches the user's emotions and characteristics. A concrete example of a prompt would be, "Please suggest ways to relax when the user is tired."

[0424] In this way, the system provides optimal coaching tailored to the user's characteristics and emotions, supporting the improvement of the user's work performance.

[0425] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0426] Step 1:

[0427] Users input voice, text, and image information using their devices. Users use voice input or tablets to record goals, challenges, and current emotional states, and this data is sent from the device to the server.

[0428] Step 2:

[0429] The server converts the received audio data into text using the Google Cloud Speech-to-Text service. Converting the audio data into text makes it easier to understand for subsequent natural language processing.

[0430] Step 3:

[0431] The server analyzes the text data using Google Cloud Natural Language to extract specific information related to the user's goals and challenges. The output here becomes the data needed to update the user's profile.

[0432] Step 4:

[0433] The server uses OpenCV and TensorFlow to analyze image and video data. Image recognition detects the user's facial expressions and situation, and the Affectiva SDK evaluates the emotion.

[0434] Step 5:

[0435] Based on the evaluated sentiment data, the server uses a generative AI model to generate user-optimized feedback. Information extracted from profile updates is combined with sentiment evaluations to aid in creating descriptive text.

[0436] Step 6:

[0437] The generated feedback is sent to the device through the feedback provision mechanism. The user receives the feedback on the device and reflects it in their actions.

[0438] Step 7:

[0439] The user acts based on the feedback received on their device, and their progress and any new inputs are recorded as they occur. This allows the system to obtain new data for the next cycle.

[0440] Step 8:

[0441] The server analyzes the newly acquired data as a monitoring tool and prepares to make the next coaching session with the user more effective.

[0442] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0443] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0444] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0445] [Third Embodiment]

[0446] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0447] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0448] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0449] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0450] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0451] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0452] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0453] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0454] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0455] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0456] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0457] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0458] This invention provides a system that offers a personalized coaching application using multimodal data to improve users' business performance. This system deeply understands the user's goals and current challenges and provides optimal feedback in real time based on that understanding.

[0459] Users input their goals and current challenges through a dedicated terminal application. The terminal accepts not only text but also multimodal data such as voice memos, images, and video data. This information is transmitted to a server via the internet.

[0460] On the server, an AI-powered analysis platform processes the data. First, text data is analyzed using natural language processing techniques to analyze the user's goals and challenges and extract the core information. Audio data is transcribed using speech recognition technology, and its content is further analyzed. For image and video data, computer vision technology is used to evaluate the user's behavior patterns and situations.

[0461] Based on the analysis results, the server generates feedback in a communication style best suited to the user's personality and preferences. This feedback includes specific advice and suggestions for improvement to help the user achieve their goals. The server then sends this feedback to the terminal for the user to review.

[0462] For example, if a user enters "I want to improve my leadership skills," the AI analyzes past presentation videos of the user and points out areas for improvement in their speaking style and body language. The server also periodically monitors the user's progress and updates the coaching content whenever new data is submitted.

[0463] This system allows users to receive personalized feedback 24 / 7, 365 days a year, enabling efficient self-improvement. This approach provides individualized advice at a reasonable cost, strongly supporting users' career advancement and improvement of business skills.

[0464] The following describes the processing flow.

[0465] Step 1:

[0466] Users input their goals and challenges through a dedicated terminal application. This can include not only text data but also multimodal data such as voice memos, images, and videos. The terminal collects this data and prepares it for transmission.

[0467] Step 2:

[0468] The terminal sends the collected data to the server. The server first integrates the received data and prepares it so that each data type can be properly parsed.

[0469] Step 3:

[0470] The server analyzes text data using natural language processing (NLP) techniques to extract keywords and context related to the goals and challenges set by the user. This process aims to accurately understand the user's intent.

[0471] Step 4:

[0472] The server converts the audio data into text using speech recognition technology and then performs analysis based on the results. It identifies how the audio content relates to the user's goals and objectives and extracts the necessary information.

[0473] Step 5:

[0474] For graphical data (images and videos), the server uses computer vision technology to analyze the user's visual behavior and facial expressions. This data is used to understand the user's performance and areas for improvement.

[0475] Step 6:

[0476] The server integrates the analysis results and updates the user's profile based on their personality and preferences. This makes the feedback generated by the system more personalized and optimized for the user.

[0477] Step 7:

[0478] The feedback generated by the server includes specific improvement suggestions for the user. For example, it may include specific steps to enhance eye contact during presentations or improve leadership skills.

[0479] Step 8:

[0480] The device receives feedback from the server and displays it through the user interface. The user then uses this feedback to work on self-improvement.

[0481] Step 9:

[0482] Whenever new user activity data is generated, the device sends it to the server. The server analyzes this new data and uses it to generate new progress evaluations and feedback. This ensures that users always receive appropriate and continuous coaching.

[0483] (Example 1)

[0484] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0485] In today's business environment, individuals need personalized feedback and guidance to continuously improve their performance. However, traditional methods have struggled to provide specific advice tailored to individual characteristics and preferences in real time. Furthermore, there has been a lack of systems that can comprehensively analyze various forms of data (voice, images, text, etc.) to effectively evaluate user behavior and progress.

[0486] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0487] In this invention, the server includes information analysis means, data adjustment means, generation means, information transmission means, monitoring means, and data analysis means. This enables real-time personalized feedback based on user characteristics, and by integrating and analyzing diverse data formats, user behavior evaluation and progress monitoring can be performed efficiently.

[0488] An "information analysis tool" is a system that has the function of analyzing information about goals and challenges received from users.

[0489] A "data adjustment mechanism" is a system that has the function of updating the user's characteristics and preference profile based on the analyzed information.

[0490] A "generation mechanism" is a system that has the function of generating personalized suggestions and guidance based on an updated profile.

[0491] An "information transmission means" is a mechanism that has the function of providing the generated proposal to a terminal.

[0492] A "monitoring system" is a mechanism that has the functionality to monitor the user's progress based on the provided suggestions and to analyze new information.

[0493] A "data analysis tool" is a system that has the function of evaluating user behavior patterns and environments using multimodal data.

[0494] A "voice conversion means" is a system that converts a user's voice data into text and uses that textual information to analyze the content of their speech.

[0495] A "video analysis system" is a mechanism that analyzes image and video information received from a user and uses the analysis results to evaluate the user's activity.

[0496] This invention is a personalized coaching system aimed at improving user performance. The system includes a series of processes that receive and analyze information regarding the user's goals and challenges, and provide personalized feedback.

[0497] Users input their goals and challenges using a dedicated terminal application. This terminal has the capability to accept multimodal data such as text, audio, images, and videos, and can flexibly handle various data input formats. For example, a user who wants to "improve their leadership skills" can upload past presentation videos.

[0498] The device sends information collected from the user to the server via the internet. This information is encrypted using a secure protocol before reaching the server.

[0499] The server passes the received data to the AI analysis platform. Text data is analyzed using natural language processing (NLP) to extract user intent and emotions. Audio data is converted to text through speech recognition technology, and its content is further analyzed. Image and video data is analyzed using computer vision technology to evaluate user behavior patterns and situations.

[0500] Based on the analysis results, the server uses a generative AI model to generate personalized suggestions in a format best suited to the user's characteristics. These suggestions include specific advice aimed at achieving the user's set goals. The generated suggestions are sent to the terminal, where the user can access and review them.

[0501] For example, consider a case where a user enters "I want to improve my leadership skills." In this case, the AI analyzes the uploaded presentation video and provides feedback pointing out areas for improvement in speaking style and gestures.

[0502] An example of a prompt is, "I want to improve my leadership skills. Please analyze my past presentation videos and tell me what I can improve."

[0503] This system provides users with appropriate feedback 24 / 7, enabling efficient self-improvement.

[0504] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0505] Step 1:

[0506] Users create input data via a dedicated terminal application. This includes entering goals and tasks in text, recording voice memos, and uploading images and videos. For example, a user might input the text "I want to improve my leadership skills" along with a related video. The entered data is collected on the terminal and sent to the server using a reliable transmission protocol.

[0507] Step 2:

[0508] The server passes the received data to the analysis platform and begins data processing. Text data is analyzed by a natural language processing (NLP) engine to analyze and extract the user's intentions and emotions. The output is key keywords and context related to the user's goals. Audio data is transcribed into text through a speech recognition system, and that text is further analyzed. The output is the content of the audio and the results of its analysis.

[0509] Step 3:

[0510] The server processes received image and video data using computer vision algorithms. It analyzes user behavior patterns, location information, and eye movements to assess the user's current state. For example, in the case of a presentation video, the output might include the frequency of gestures and eye movements, along with areas for improvement.

[0511] Step 4:

[0512] The server uses a generative AI model to generate personalized exercise suggestions based on data analysis. This generation process considers the user's profile, analysis results, and historical data trends to provide optimal feedback. The output consists of a file containing specific advice and improvement suggestions for the user.

[0513] Step 5:

[0514] The server sends the generated feedback to the device and presents it in a format that is easy for the user to understand. The device displays the received feedback and notifies the user. This allows the user to review the feedback and use it to improve their daily actions.

[0515] Step 6:

[0516] Users work on self-improvement based on the feedback they receive. New data is periodically sent from their devices to the server so that progress can be continuously monitored. The server re-analyzes this new data and updates the feedback. This ensures that the most up-to-date and relevant advice is always provided.

[0517] (Application Example 1)

[0518] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0519] In today's business environment, there is a demand for timely and efficient personalized feedback and advice. However, current technology makes it difficult to provide real-time feedback based on user personality and preferences, hindering effective self-improvement. Furthermore, obtaining appropriate skill-building advice in daily life is also challenging.

[0520] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0521] In this invention, the server includes analysis means for analyzing information on goals and tasks received from the user; profile update means for updating the user's personality and preference profile based on the analyzed information; generation means for generating personalized feedback and advice based on the updated profile; and activity observation means for observing the user's daily activities through a consumer robot and providing advice in real time. This enables the user to effectively improve their business skills even in their daily life.

[0522] "Analysis means" refers to methods and devices for processing information received from users and understanding goals and challenges.

[0523] A "profile update means" is a method or device for modifying and updating data related to a user's personality and preferences based on analyzed information.

[0524] "Generative means" refers to methods and devices for generating personalized feedback and advice based on updated profiles.

[0525] A "progress tracking method" refers to a technique or device for monitoring user progress based on generated feedback and analyzing the associated data.

[0526] "Activity observation means" refers to methods and devices that use consumer robots to record a user's daily actions and provide real-time advice based on the observation results.

[0527] The system for realizing this invention utilizes consumer robots that users use on a daily basis and has a mechanism to provide feedback on goals and tasks that the user proactively presents. The user's activities are recorded by the robot, using sensors such as cameras and microphones. The robot transmits the collected data to a cloud server.

[0528] On the server, analysis tools are used to first convert audio data into text using speech recognition software (e.g., "Google Speech-to-Text API"). Next, the text data and image / video data are analyzed using natural language processing software (e.g., "spaCy" or "Transformers") and image analysis software (e.g., "OpenCV" or "TensorFlow"). Based on this analysis, the user's personality and preference profile is updated by a profile update tool.

[0529] The generation method uses a generative AI model to generate personalized feedback and advice tailored to the user's profile. The generated feedback is then further monitored by the progress tracking method to track the user's performance and growth, enabling further analysis as needed.

[0530] For example, if a user is cooking, the robot observes their movements and provides real-time advice to improve their cooking skills. An example of a prompt might be, "Analyze the user's body language and speech during cooking to generate feedback for improving their cooking skills." This allows users to effectively improve their skills from the comfort of their homes, and the system is particularly useful for improving business skills.

[0531] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0532] Step 1:

[0533] The terminal (robot) records the user's activities as audio and image data. This is done using microphones and cameras mounted on the robot. The input is the user's real-time audio and video, and the output after processing is recorded audio data files and video data files.

[0534] Step 2:

[0535] The terminal transmits recorded data to a server via the internet. Audio and image data files are transmitted. The input is an audio / video data file, and the output is a similar data file stored on the server.

[0536] Step 3:

[0537] The server converts the audio data into text for natural language processing. This process utilizes speech recognition technologies such as the "Google Speech-to-Text API." The input is an audio data file, and the output, as a result of the processing, is obtained as text data.

[0538] Step 4:

[0539] The server performs text analysis and processes the data to understand the user's goals and challenges. Natural language processing software such as "spaCy" and "Transformers" are used for this process. The input is text data, and the output is analyzed goal and challenge data.

[0540] Step 5:

[0541] The server analyzes video data to extract user behavior and behavioral patterns. This analysis uses image recognition technology based on "OpenCV" and "TensorFlow". The input is a video data file, and the output is behavioral pattern data.

[0542] Step 6:

[0543] The server updates the user's profile based on the analyzed text and behavioral pattern data. This adjusts the profile based on the input, and the output is the updated profile data.

[0544] Step 7:

[0545] The server uses a generative AI model to generate personalized feedback based on the updated profile. The input is the updated profile data, and the output is the text of the generated feedback.

[0546] Step 8:

[0547] The server sends the generated feedback data back to the terminal. This allows the user to receive feedback through the terminal. The input is the generated feedback data, and the output is the feedback displayed or audibly presented on the terminal.

[0548] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0549] This invention is an advanced coaching system aimed at improving user performance in business settings. This system not only receives and analyzes information about the user's goals and challenges, but also recognizes the user's emotional state through an emotion engine and provides personalized feedback.

[0550] Users can record their mood for the day or their thoughts on specific tasks using a dedicated terminal application. This information is entered as text data, voice memos, and, if necessary, as images or videos, and is all sent to the server.

[0551] The server comprehensively processes the received data. Text and audio data are analyzed using natural language processing and speech recognition technologies to extract specific information related to the user's problem. Furthermore, image and video data are analyzed using computer vision technology to evaluate the user's visual behavior and situation.

[0552] In addition, the emotion engine evaluates the user's emotions in real time from image and audio data. It performs facial expression analysis and determines the type and intensity of emotions from voice tone and word choice. This information is added to the user profile that has been set up so that feedback is generated that is tailored to the user's current emotional state.

[0553] For example, if a user who wants to improve their leadership skills feels anxious about giving a presentation, the system will provide advice to alleviate that anxiety. Specifically, if the system determines that the user's stress level is high, the emotional engine will include relaxation techniques and specific preparation points in its advice.

[0554] Ultimately, the device receives feedback from the server and presents it to the user. Based on this information, the user develops an action plan, puts it into action, and records their progress on the device as it progresses. This allows the server to analyze the new data and continue to adjust the coaching accordingly. In this way, the present invention realizes more effective support for self-improvement that takes the user's emotions into consideration.

[0555] The following describes the processing flow.

[0556] Step 1:

[0557] Users use a terminal application to input their goals, challenges, and emotional status. This can include not only text but also spoken words expressing their feelings and videos of their facial expressions.

[0558] Step 2:

[0559] The terminal sends the collected data to the server. The server receives the various types of data and distributes them to the appropriate processing module according to their type.

[0560] Step 3:

[0561] The server analyzes text data using natural language processing techniques to extract important keywords and contexts related to the user's set tasks and goals.

[0562] Step 4:

[0563] The server converts the audio data into text using speech recognition technology, and then analyzes the content of the speech based on the results. This makes it possible to utilize data that is directly relevant to the user's problems.

[0564] Step 5:

[0565] The server analyzes image and video data using computer vision technology to extract user visual indicators and behavioral patterns. This allows for an evaluation of the user's actual performance.

[0566] Step 6:

[0567] The emotion engine identifies the user's emotional state from image and audio data. It reads emotions from facial expressions and simultaneously analyzes emotions from voice tone and rhythm, sending the results to the server.

[0568] Step 7:

[0569] The server integrates all the analytical data and updates the user's profile to reflect their personality and emotions. This result is incorporated into the feedback, making it more personalized.

[0570] Step 8:

[0571] The server sends the generated feedback and advice to the device. The device receives this feedback and displays it to the user.

[0572] Step 9:

[0573] Users accept feedback and use it to improve their actions and situations. Newly generated data (such as new audio recordings or videos) is sent again from the device to the server, and the cycle continues.

[0574] Step 10:

[0575] The server analyzes new data, continuously evaluates progress, and updates feedback as needed. This process provides users with continuous support and opportunities for improvement.

[0576] (Example 2)

[0577] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0578] In business settings, there is a growing need to provide personalized feedback tailored to the user's emotions and circumstances in order to improve user performance. However, conventional systems have struggled to adequately analyze user input and accurately assess emotional states, making it difficult to effectively generate personalized feedback.

[0579] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0580] In this invention, the server is an information processing device for analyzing information received from a user, and includes means for analyzing the user's input using natural language processing technology and speech recognition technology; emotion estimation means for evaluating the user's emotional state based on the analyzed information; and a generation device for generating personalized feedback and advice based on the emotional state evaluated by the emotion estimation means and an updated profile. This makes it possible to provide highly personalized feedback that is tailored to the user's emotions and situation.

[0581] An "information processing device" is a device that has the function of analyzing various data received from a user and processing it based on that analysis.

[0582] "Natural language processing technology" refers to the technology that enables computers to understand, interpret, and generate language that humans use naturally.

[0583] "Speech recognition technology" is a technology that analyzes audio data and converts it into text data.

[0584] An "emotion estimation method" is a means of identifying a user's emotional state based on user input information and evaluating its intensity and type.

[0585] A "generation device" is a device that uses existing data and algorithms to generate information or advice tailored to a specific purpose.

[0586] "Feedback" refers to information provided for the purpose of improvement and advice, based on analysis results regarding user behavior and circumstances.

[0587] A "profile" is a dataset created based on a user's characteristics and past behavior, and it provides the foundational information necessary to enable personalized responses.

[0588] This invention is an advanced coaching system designed to improve users' business performance. This system comprehensively analyzes user input data and provides personalized feedback that takes into account their emotional state. Specifically, it begins with the user using a device with a dedicated application installed to input their daily mood and challenges. This input is in the form of text, audio, images, and video data, which are then transmitted from the device to the server.

[0589] The server, as an information processing device, performs various data analyses. Text and audio data are analyzed using natural language processing and speech recognition technologies. "Google Cloud Natural Language API" and "Amazon Transcribe" are available for this purpose. Furthermore, for image and video data, computer vision software such as "OpenCV" and "Amazon Rekognition" are used to analyze the user's visual information and understand the situation.

[0590] Based on the analysis results, the server uses a generative AI model to create prompts and evaluates the user's emotional state using emotion estimation tools. For example, if a user is feeling anxious about a presentation, the server can generate a specific question as a prompt, such as "How can I overcome my anxiety about the presentation?" This prompt is then input into the generative AI model, which provides optimized feedback.

[0591] Feedback is sent to the device and displayed to the user. Users utilize this feedback to develop and execute action plans, thereby accelerating goal achievement. The advantage of this system lies in its ability to provide real-time, appropriate feedback tailored to each user's individual circumstances. This allows users to efficiently pursue self-improvement.

[0592] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0593] Step 1:

[0594] Users input emotional data and information about their challenges using a dedicated terminal application.

[0595] The input information can take the form of text, voice memos, images, videos, and other formats. This data is temporarily stored in the device's storage and then prepared for transmission to the server.

[0596] Step 2:

[0597] The device sends information obtained from the user to the server.

[0598] In this process, the device encrypts the data via Wi-Fi or a mobile network and transfers it to the server's cloud storage for receiving data. Once the transmission is complete, the device notifies the user of the successful transmission.

[0599] Step 3:

[0600] The server processes the received data in order to analyze it.

[0601] Natural language processing techniques are used to analyze text and audio data, extracting keywords and emotions. Audio data is converted to text using speech recognition technology. Computer vision technology analyzes image and video data to evaluate user behavior and situations. The analysis results are stored in a database.

[0602] Step 4:

[0603] The server evaluates the user's emotional state based on the analysis results.

[0604] Using emotion estimation tools, the type and intensity of emotions are identified based on information extracted from user input data. This evaluation is added to the user's profile and serves as the basis for generating feedback.

[0605] Step 5:

[0606] The server uses a generated AI model to create prompt messages and generate personalized feedback.

[0607] Based on emotional states and goals, the server prompts an AI model with a prompt such as, "Please tell me how to overcome presentation anxiety," and generates optimal advice. The results are stored as feedback data.

[0608] Step 6:

[0609] The server generates feedback, which is then sent to the terminal and presented to the user.

[0610] The device displays feedback received from the server via notifications or within the app, allowing the user to review it and develop an action plan. Feedback viewing activity on the device is recorded as a log.

[0611] Step 7:

[0612] We plan and execute actions based on the feedback provided by users.

[0613] By recording the results and progress of actions on the device again, the server receives new data, and the analysis and feedback process is repeated. In this way, the user's skills and emotional state are continuously improved.

[0614] (Application Example 2)

[0615] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0616] In today's business environment, there is a problem in that approaches to individual users' goals and challenges are uniform, and appropriate support is not provided according to individual circumstances and emotional states. Furthermore, there is a problem in that effective self-improvement cannot be achieved because there are insufficient mechanisms to understand users' emotional states and customize feedback based on them.

[0617] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0618] In this invention, the server includes analysis means for analyzing information about goals and tasks received from the user, profile update means for updating the user's characteristics and preference profile based on the analyzed information, and emotion evaluation means for evaluating the user's emotional state in real time from voice and visual data and generating emotion-appropriate feedback. This makes it possible to provide personalized feedback that is tailored to the user's characteristics and emotions.

[0619] "Analysis means" refers to a means of performing a process to analyze information received from a user and understand its content.

[0620] A "profile update method" is a means of revising data related to user characteristics and preferences based on analysis results and keeping it up-to-date.

[0621] "Generation means" refers to the means of generating personalized feedback and advice for users by utilizing updated profiles.

[0622] A "feedback provision method" is a means of transmitting the generated feedback to the user's device so that the user can review it.

[0623] "Monitoring measures" refer to methods for observing user performance based on provided feedback and for re-analyzing newly obtained information.

[0624] An "emotion evaluation tool" is a means of using the user's voice and visual information to determine the user's emotional state in real time and reflect that in the feedback.

[0625] This invention realizes an advanced coaching system aimed at improving user performance in business settings. This system consists of a server, a user terminal, and an environment for receiving data input from the user.

[0626] Server Functions

[0627] The server uses natural language processing and speech recognition technologies as analytical tools to analyze information about goals and challenges received from users. Specifically, it uses Google Cloud Speech-to-Text to convert speech data into text and analyzes it with Google Cloud Natural Language. This provides data for updating the user's characteristics and preference profile. The updated information is managed by a profile update tool, and personalized feedback and advice are generated using a generation tool.

[0628] The server also includes emotion assessment tools to evaluate the user's emotional state. Using the Affectiva SDK and computer vision technologies with OpenCV and TensorFlow, it analyzes emotions in real time from audio and visual data and incorporates them into the feedback.

[0629] Terminal role

[0630] The device functions as a means of providing feedback, receiving feedback from the server and presenting it to the user. The user can review the feedback received through the device and incorporate it into their daily actions.

[0631] User interaction

[0632] Users provide the system with their mood and goals using a tablet or voice input device. This allows them to receive coaching optimized for their current state. For example, a user who needs to relax after a busy day at work might receive appropriate advice such as, "Take a deep breath and relax. Also, make a quick plan for tomorrow and get some rest early."

[0633] Utilization of Generative AI Models

[0634] The system can use a generative AI model to dynamically generate feedback that matches the user's emotions and characteristics. A concrete example of a prompt would be, "Please suggest ways to relax when the user is tired."

[0635] In this way, the system provides optimal coaching tailored to the user's characteristics and emotions, supporting the improvement of the user's work performance.

[0636] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0637] Step 1:

[0638] Users input voice, text, and image information using their devices. Users use voice input or tablets to record goals, challenges, and current emotional states, and this data is sent from the device to the server.

[0639] Step 2:

[0640] The server converts the received audio data into text using the Google Cloud Speech-to-Text service. Converting the audio data into text makes it easier to understand for subsequent natural language processing.

[0641] Step 3:

[0642] The server analyzes the text data using Google Cloud Natural Language to extract specific information related to the user's goals and challenges. The output here becomes the data needed to update the user's profile.

[0643] Step 4:

[0644] The server uses OpenCV and TensorFlow to analyze image and video data. Image recognition detects the user's facial expressions and situation, and the Affectiva SDK evaluates the emotion.

[0645] Step 5:

[0646] Based on the evaluated sentiment data, the server uses a generative AI model to generate user-optimized feedback. Information extracted from profile updates is combined with sentiment evaluations to aid in creating descriptive text.

[0647] Step 6:

[0648] The generated feedback is sent to the device through the feedback provision mechanism. The user receives the feedback on the device and reflects it in their actions.

[0649] Step 7:

[0650] The user acts based on the feedback received on their device, and their progress and any new inputs are recorded as they occur. This allows the system to obtain new data for the next cycle.

[0651] Step 8:

[0652] The server analyzes the newly acquired data as a monitoring tool and prepares to make the next coaching session with the user more effective.

[0653] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0654] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0655] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0656] [Fourth Embodiment]

[0657] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0658] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0659] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0660] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0661] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0662] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0663] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0664] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0665] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0666] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0667] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0668] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0669] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0670] This invention provides a system that offers a personalized coaching application using multimodal data to improve users' business performance. This system deeply understands the user's goals and current challenges and provides optimal feedback in real time based on that understanding.

[0671] Users input their goals and current challenges through a dedicated terminal application. The terminal accepts not only text but also multimodal data such as voice memos, images, and video data. This information is transmitted to a server via the internet.

[0672] On the server, an AI-powered analysis platform processes the data. First, text data is analyzed using natural language processing techniques to analyze the user's goals and challenges and extract the core information. Audio data is transcribed using speech recognition technology, and its content is further analyzed. For image and video data, computer vision technology is used to evaluate the user's behavior patterns and situations.

[0673] Based on the analysis results, the server generates feedback in a communication style best suited to the user's personality and preferences. This feedback includes specific advice and suggestions for improvement to help the user achieve their goals. The server then sends this feedback to the terminal for the user to review.

[0674] For example, if a user enters "I want to improve my leadership skills," the AI analyzes past presentation videos of the user and points out areas for improvement in their speaking style and body language. The server also periodically monitors the user's progress and updates the coaching content whenever new data is submitted.

[0675] This system allows users to receive personalized feedback 24 / 7, 365 days a year, enabling efficient self-improvement. This approach provides individualized advice at a reasonable cost, strongly supporting users' career advancement and improvement of business skills.

[0676] The following describes the processing flow.

[0677] Step 1:

[0678] Users input their goals and challenges through a dedicated terminal application. This can include not only text data but also multimodal data such as voice memos, images, and videos. The terminal collects this data and prepares it for transmission.

[0679] Step 2:

[0680] The terminal sends the collected data to the server. The server first integrates the received data and prepares it so that each data type can be properly parsed.

[0681] Step 3:

[0682] The server analyzes text data using natural language processing (NLP) techniques to extract keywords and context related to the goals and challenges set by the user. This process aims to accurately understand the user's intent.

[0683] Step 4:

[0684] The server converts the audio data into text using speech recognition technology and then performs analysis based on the results. It identifies how the audio content relates to the user's goals and objectives and extracts the necessary information.

[0685] Step 5:

[0686] For graphical data (images and videos), the server uses computer vision technology to analyze the user's visual behavior and facial expressions. This data is used to understand the user's performance and areas for improvement.

[0687] Step 6:

[0688] The server integrates the analysis results and updates the user's profile based on their personality and preferences. This makes the feedback generated by the system more personalized and optimized for the user.

[0689] Step 7:

[0690] The feedback generated by the server includes specific improvement suggestions for the user. For example, it may include specific steps to enhance eye contact during presentations or improve leadership skills.

[0691] Step 8:

[0692] The device receives feedback from the server and displays it through the user interface. The user then uses this feedback to work on self-improvement.

[0693] Step 9:

[0694] Whenever new user activity data is generated, the device sends it to the server. The server analyzes this new data and uses it to generate new progress evaluations and feedback. This ensures that users always receive appropriate and continuous coaching.

[0695] (Example 1)

[0696] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0697] In today's business environment, individuals need personalized feedback and guidance to continuously improve their performance. However, traditional methods have struggled to provide specific advice tailored to individual characteristics and preferences in real time. Furthermore, there has been a lack of systems that can comprehensively analyze various forms of data (voice, images, text, etc.) to effectively evaluate user behavior and progress.

[0698] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0699] In this invention, the server includes information analysis means, data adjustment means, generation means, information transmission means, monitoring means, and data analysis means. This enables real-time personalized feedback based on user characteristics, and by integrating and analyzing diverse data formats, user behavior evaluation and progress monitoring can be performed efficiently.

[0700] An "information analysis tool" is a system that has the function of analyzing information about goals and challenges received from users.

[0701] A "data adjustment mechanism" is a system that has the function of updating the user's characteristics and preference profile based on the analyzed information.

[0702] A "generation mechanism" is a system that has the function of generating personalized suggestions and guidance based on an updated profile.

[0703] An "information transmission means" is a mechanism that has the function of providing the generated proposal to a terminal.

[0704] A "monitoring system" is a mechanism that has the functionality to monitor the user's progress based on the provided suggestions and to analyze new information.

[0705] A "data analysis tool" is a system that has the function of evaluating user behavior patterns and environments using multimodal data.

[0706] A "voice conversion means" is a system that converts a user's voice data into text and uses that textual information to analyze the content of their speech.

[0707] A "video analysis system" is a mechanism that analyzes image and video information received from a user and uses the analysis results to evaluate the user's activity.

[0708] This invention is a personalized coaching system aimed at improving user performance. The system includes a series of processes that receive and analyze information regarding the user's goals and challenges, and provide personalized feedback.

[0709] Users input their goals and challenges using a dedicated terminal application. This terminal has the capability to accept multimodal data such as text, audio, images, and videos, and can flexibly handle various data input formats. For example, a user who wants to "improve their leadership skills" can upload past presentation videos.

[0710] The device sends information collected from the user to the server via the internet. This information is encrypted using a secure protocol before reaching the server.

[0711] The server passes the received data to the AI analysis platform. Text data is analyzed using natural language processing (NLP) to extract user intent and emotions. Audio data is converted to text through speech recognition technology, and its content is further analyzed. Image and video data is analyzed using computer vision technology to evaluate user behavior patterns and situations.

[0712] Based on the analysis results, the server uses a generative AI model to generate personalized suggestions in a format best suited to the user's characteristics. These suggestions include specific advice aimed at achieving the user's set goals. The generated suggestions are sent to the terminal, where the user can access and review them.

[0713] For example, consider a case where a user enters "I want to improve my leadership skills." In this case, the AI analyzes the uploaded presentation video and provides feedback pointing out areas for improvement in speaking style and gestures.

[0714] An example of a prompt is, "I want to improve my leadership skills. Please analyze my past presentation videos and tell me what I can improve."

[0715] This system provides users with appropriate feedback 24 / 7, enabling efficient self-improvement.

[0716] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0717] Step 1:

[0718] Users create input data via a dedicated terminal application. This includes entering goals and tasks in text, recording voice memos, and uploading images and videos. For example, a user might input the text "I want to improve my leadership skills" along with a related video. The entered data is collected on the terminal and sent to the server using a reliable transmission protocol.

[0719] Step 2:

[0720] The server passes the received data to the analysis platform and begins data processing. Text data is analyzed by a natural language processing (NLP) engine to analyze and extract the user's intentions and emotions. The output is key keywords and context related to the user's goals. Audio data is transcribed into text through a speech recognition system, and that text is further analyzed. The output is the content of the audio and the results of its analysis.

[0721] Step 3:

[0722] The server processes received image and video data using computer vision algorithms. It analyzes user behavior patterns, location information, and eye movements to assess the user's current state. For example, in the case of a presentation video, the output might include the frequency of gestures and eye movements, along with areas for improvement.

[0723] Step 4:

[0724] The server uses a generative AI model to generate personalized exercise suggestions based on data analysis. This generation process considers the user's profile, analysis results, and historical data trends to provide optimal feedback. The output consists of a file containing specific advice and improvement suggestions for the user.

[0725] Step 5:

[0726] The server sends the generated feedback to the device and presents it in a format that is easy for the user to understand. The device displays the received feedback and notifies the user. This allows the user to review the feedback and use it to improve their daily actions.

[0727] Step 6:

[0728] Users work on self-improvement based on the feedback they receive. New data is periodically sent from their devices to the server so that progress can be continuously monitored. The server re-analyzes this new data and updates the feedback. This ensures that the most up-to-date and relevant advice is always provided.

[0729] (Application Example 1)

[0730] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0731] In today's business environment, there is a demand for timely and efficient personalized feedback and advice. However, current technology makes it difficult to provide real-time feedback based on user personality and preferences, hindering effective self-improvement. Furthermore, obtaining appropriate skill-building advice in daily life is also challenging.

[0732] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0733] In this invention, the server includes analysis means for analyzing information on goals and tasks received from the user; profile update means for updating the user's personality and preference profile based on the analyzed information; generation means for generating personalized feedback and advice based on the updated profile; and activity observation means for observing the user's daily activities through a consumer robot and providing advice in real time. This enables the user to effectively improve their business skills even in their daily life.

[0734] "Analysis means" refers to methods and devices for processing information received from users and understanding goals and challenges.

[0735] A "profile update means" is a method or device for modifying and updating data related to a user's personality and preferences based on analyzed information.

[0736] "Generative means" refers to methods and devices for generating personalized feedback and advice based on updated profiles.

[0737] A "progress tracking method" refers to a technique or device for monitoring user progress based on generated feedback and analyzing the associated data.

[0738] "Activity observation means" refers to methods and devices that use consumer robots to record a user's daily actions and provide real-time advice based on the observation results.

[0739] The system for realizing this invention utilizes consumer robots that users use on a daily basis and has a mechanism to provide feedback on goals and tasks that the user proactively presents. The user's activities are recorded by the robot, using sensors such as cameras and microphones. The robot transmits the collected data to a cloud server.

[0740] On the server, analysis tools are used to first convert audio data into text using speech recognition software (e.g., "Google Speech-to-Text API"). Next, the text data and image / video data are analyzed using natural language processing software (e.g., "spaCy" or "Transformers") and image analysis software (e.g., "OpenCV" or "TensorFlow"). Based on this analysis, the user's personality and preference profile is updated by a profile update tool.

[0741] The generation method uses a generative AI model to generate personalized feedback and advice tailored to the user's profile. The generated feedback is then further monitored by the progress tracking method to track the user's performance and growth, enabling further analysis as needed.

[0742] For example, if a user is cooking, the robot observes their movements and provides real-time advice to improve their cooking skills. An example of a prompt might be, "Analyze the user's body language and speech during cooking to generate feedback for improving their cooking skills." This allows users to effectively improve their skills from the comfort of their homes, and the system is particularly useful for improving business skills.

[0743] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0744] Step 1:

[0745] The terminal (robot) records the user's activities as audio and image data. This is done using microphones and cameras mounted on the robot. The input is the user's real-time audio and video, and the output after processing is recorded audio data files and video data files.

[0746] Step 2:

[0747] The terminal transmits recorded data to a server via the internet. Audio and image data files are transmitted. The input is an audio / video data file, and the output is a similar data file stored on the server.

[0748] Step 3:

[0749] The server converts the audio data into text for natural language processing. This process utilizes speech recognition technologies such as the "Google Speech-to-Text API." The input is an audio data file, and the output, as a result of the processing, is obtained as text data.

[0750] Step 4:

[0751] The server performs text analysis and processes the data to understand the user's goals and challenges. Natural language processing software such as "spaCy" and "Transformers" are used for this process. The input is text data, and the output is analyzed goal and challenge data.

[0752] Step 5:

[0753] The server analyzes video data to extract user behavior and behavioral patterns. This analysis uses image recognition technology based on "OpenCV" and "TensorFlow". The input is a video data file, and the output is behavioral pattern data.

[0754] Step 6:

[0755] The server updates the user's profile based on the analyzed text and behavioral pattern data. This adjusts the profile based on the input, and the output is the updated profile data.

[0756] Step 7:

[0757] The server uses a generative AI model to generate personalized feedback based on the updated profile. The input is the updated profile data, and the output is the text of the generated feedback.

[0758] Step 8:

[0759] The server sends the generated feedback data back to the terminal. This allows the user to receive feedback through the terminal. The input is the generated feedback data, and the output is the feedback displayed or audibly presented on the terminal.

[0760] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0761] This invention is an advanced coaching system aimed at improving user performance in business settings. This system not only receives and analyzes information about the user's goals and challenges, but also recognizes the user's emotional state through an emotion engine and provides personalized feedback.

[0762] Users can record their mood for the day or their thoughts on specific tasks using a dedicated terminal application. This information is entered as text data, voice memos, and, if necessary, as images or videos, and is all sent to the server.

[0763] The server comprehensively processes the received data. Text and audio data are analyzed using natural language processing and speech recognition technologies to extract specific information related to the user's problem. Furthermore, image and video data are analyzed using computer vision technology to evaluate the user's visual behavior and situation.

[0764] In addition, the emotion engine evaluates the user's emotions in real time from image and audio data. It performs facial expression analysis and determines the type and intensity of emotions from voice tone and word choice. This information is added to the user profile that has been set up so that feedback is generated that is tailored to the user's current emotional state.

[0765] For example, if a user who wants to improve their leadership skills feels anxious about giving a presentation, the system will provide advice to alleviate that anxiety. Specifically, if the system determines that the user's stress level is high, the emotional engine will include relaxation techniques and specific preparation points in its advice.

[0766] Ultimately, the device receives feedback from the server and presents it to the user. Based on this information, the user develops an action plan, puts it into action, and records their progress on the device as it progresses. This allows the server to analyze the new data and continue to adjust the coaching accordingly. In this way, the present invention realizes more effective support for self-improvement that takes the user's emotions into consideration.

[0767] The following describes the processing flow.

[0768] Step 1:

[0769] Users use a terminal application to input their goals, challenges, and emotional status. This can include not only text but also spoken words expressing their feelings and videos of their facial expressions.

[0770] Step 2:

[0771] The terminal sends the collected data to the server. The server receives the various types of data and distributes them to the appropriate processing module according to their type.

[0772] Step 3:

[0773] The server analyzes text data using natural language processing techniques to extract important keywords and contexts related to the user's set tasks and goals.

[0774] Step 4:

[0775] The server converts the audio data into text using speech recognition technology, and then analyzes the content of the speech based on the results. This makes it possible to utilize data that is directly relevant to the user's problems.

[0776] Step 5:

[0777] The server analyzes image and video data using computer vision technology to extract user visual indicators and behavioral patterns. This allows for an evaluation of the user's actual performance.

[0778] Step 6:

[0779] The emotion engine identifies the user's emotional state from image and audio data. It reads emotions from facial expressions and simultaneously analyzes emotions from voice tone and rhythm, sending the results to the server.

[0780] Step 7:

[0781] The server integrates all the analytical data and updates the user's profile to reflect their personality and emotions. This result is incorporated into the feedback, making it more personalized.

[0782] Step 8:

[0783] The server sends the generated feedback and advice to the device. The device receives this feedback and displays it to the user.

[0784] Step 9:

[0785] Users accept feedback and use it to improve their actions and situations. Newly generated data (such as new audio recordings or videos) is sent again from the device to the server, and the cycle continues.

[0786] Step 10:

[0787] The server analyzes new data, continuously evaluates progress, and updates feedback as needed. This process provides users with continuous support and opportunities for improvement.

[0788] (Example 2)

[0789] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0790] In business settings, there is a growing need to provide personalized feedback tailored to the user's emotions and circumstances in order to improve user performance. However, conventional systems have struggled to adequately analyze user input and accurately assess emotional states, making it difficult to effectively generate personalized feedback.

[0791] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0792] In this invention, the server is an information processing device for analyzing information received from a user, and includes means for analyzing the user's input using natural language processing technology and speech recognition technology; emotion estimation means for evaluating the user's emotional state based on the analyzed information; and a generation device for generating personalized feedback and advice based on the emotional state evaluated by the emotion estimation means and an updated profile. This makes it possible to provide highly personalized feedback that is tailored to the user's emotions and situation.

[0793] An "information processing device" is a device that has the function of analyzing various data received from a user and processing it based on that analysis.

[0794] "Natural language processing technology" refers to the technology that enables computers to understand, interpret, and generate language that humans use naturally.

[0795] "Speech recognition technology" is a technology that analyzes audio data and converts it into text data.

[0796] An "emotion estimation method" is a means of identifying a user's emotional state based on user input information and evaluating its intensity and type.

[0797] A "generation device" is a device that uses existing data and algorithms to generate information or advice tailored to a specific purpose.

[0798] "Feedback" refers to information provided for the purpose of improvement and advice, based on analysis results regarding user behavior and circumstances.

[0799] A "profile" is a dataset created based on a user's characteristics and past behavior, and it provides the foundational information necessary to enable personalized responses.

[0800] This invention is an advanced coaching system designed to improve users' business performance. This system comprehensively analyzes user input data and provides personalized feedback that takes into account their emotional state. Specifically, it begins with the user using a device with a dedicated application installed to input their daily mood and challenges. This input is in the form of text, audio, images, and video data, which are then transmitted from the device to the server.

[0801] The server, as an information processing device, performs various data analyses. Text and audio data are analyzed using natural language processing and speech recognition technologies. "Google Cloud Natural Language API" and "Amazon Transcribe" are available for this purpose. Furthermore, for image and video data, computer vision software such as "OpenCV" and "Amazon Rekognition" are used to analyze the user's visual information and understand the situation.

[0802] Based on the analysis results, the server uses a generative AI model to create prompts and evaluates the user's emotional state using emotion estimation tools. For example, if a user is feeling anxious about a presentation, the server can generate a specific question as a prompt, such as "How can I overcome my anxiety about the presentation?" This prompt is then input into the generative AI model, which provides optimized feedback.

[0803] Feedback is sent to the device and displayed to the user. Users utilize this feedback to develop and execute action plans, thereby accelerating goal achievement. The advantage of this system lies in its ability to provide real-time, appropriate feedback tailored to each user's individual circumstances. This allows users to efficiently pursue self-improvement.

[0804] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0805] Step 1:

[0806] Users input emotional data and information about their challenges using a dedicated terminal application.

[0807] The input information can take the form of text, voice memos, images, videos, and other formats. This data is temporarily stored in the device's storage and then prepared for transmission to the server.

[0808] Step 2:

[0809] The device sends information obtained from the user to the server.

[0810] In this process, the device encrypts the data via Wi-Fi or a mobile network and transfers it to the server's cloud storage for receiving data. Once the transmission is complete, the device notifies the user of the successful transmission.

[0811] Step 3:

[0812] The server processes the received data in order to analyze it.

[0813] Natural language processing techniques are used to analyze text and audio data, extracting keywords and emotions. Audio data is converted to text using speech recognition technology. Computer vision technology analyzes image and video data to evaluate user behavior and situations. The analysis results are stored in a database.

[0814] Step 4:

[0815] The server evaluates the user's emotional state based on the analysis results.

[0816] Using emotion estimation tools, the type and intensity of emotions are identified based on information extracted from user input data. This evaluation is added to the user's profile and serves as the basis for generating feedback.

[0817] Step 5:

[0818] The server uses a generated AI model to create prompt messages and generate personalized feedback.

[0819] Based on emotional states and goals, the server prompts an AI model with a prompt such as, "Please tell me how to overcome presentation anxiety," and generates optimal advice. The results are stored as feedback data.

[0820] Step 6:

[0821] The server generates feedback, which is then sent to the terminal and presented to the user.

[0822] The device displays feedback received from the server via notifications or within the app, allowing the user to review it and develop an action plan. Feedback viewing activity on the device is recorded as a log.

[0823] Step 7:

[0824] We plan and execute actions based on the feedback provided by users.

[0825] By recording the results and progress of actions on the device again, the server receives new data, and the analysis and feedback process is repeated. In this way, the user's skills and emotional state are continuously improved.

[0826] (Application Example 2)

[0827] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0828] In today's business environment, there is a problem in that approaches to individual users' goals and challenges are uniform, and appropriate support is not provided according to individual circumstances and emotional states. Furthermore, there is a problem in that effective self-improvement cannot be achieved because there are insufficient mechanisms to understand users' emotional states and customize feedback based on them.

[0829] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0830] In this invention, the server includes analysis means for analyzing information about goals and tasks received from the user, profile update means for updating the user's characteristics and preference profile based on the analyzed information, and emotion evaluation means for evaluating the user's emotional state in real time from voice and visual data and generating emotion-appropriate feedback. This makes it possible to provide personalized feedback that is tailored to the user's characteristics and emotions.

[0831] "Analysis means" refers to a means of performing a process to analyze information received from a user and understand its content.

[0832] A "profile update method" is a means of revising data related to user characteristics and preferences based on analysis results and keeping it up-to-date.

[0833] "Generation means" refers to the means of generating personalized feedback and advice for users by utilizing updated profiles.

[0834] A "feedback provision method" is a means of transmitting the generated feedback to the user's device so that the user can review it.

[0835] "Monitoring measures" refer to methods for observing user performance based on provided feedback and for re-analyzing newly obtained information.

[0836] An "emotion evaluation tool" is a means of using the user's voice and visual information to determine the user's emotional state in real time and reflect that in the feedback.

[0837] This invention realizes an advanced coaching system aimed at improving user performance in business settings. This system consists of a server, a user terminal, and an environment for receiving data input from the user.

[0838] Server Functions

[0839] The server uses natural language processing and speech recognition technologies as analytical tools to analyze information about goals and challenges received from users. Specifically, it uses Google Cloud Speech-to-Text to convert speech data into text and analyzes it with Google Cloud Natural Language. This provides data for updating the user's characteristics and preference profile. The updated information is managed by a profile update tool, and personalized feedback and advice are generated using a generation tool.

[0840] The server also includes emotion assessment tools to evaluate the user's emotional state. Using the Affectiva SDK and computer vision technologies with OpenCV and TensorFlow, it analyzes emotions in real time from audio and visual data and incorporates them into the feedback.

[0841] Terminal role

[0842] The device functions as a means of providing feedback, receiving feedback from the server and presenting it to the user. The user can review the feedback received through the device and incorporate it into their daily actions.

[0843] User interaction

[0844] Users provide the system with their mood and goals using a tablet or voice input device. This allows them to receive coaching optimized for their current state. For example, a user who needs to relax after a busy day at work might receive appropriate advice such as, "Take a deep breath and relax. Also, make a quick plan for tomorrow and get some rest early."

[0845] Utilization of Generative AI Models

[0846] The system can use a generative AI model to dynamically generate feedback that matches the user's emotions and characteristics. A concrete example of a prompt would be, "Please suggest ways to relax when the user is tired."

[0847] In this way, the system provides optimal coaching tailored to the user's characteristics and emotions, supporting the improvement of the user's work performance.

[0848] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0849] Step 1:

[0850] Users input voice, text, and image information using their devices. Users use voice input or tablets to record goals, challenges, and current emotional states, and this data is sent from the device to the server.

[0851] Step 2:

[0852] The server converts the received audio data into text using the Google Cloud Speech-to-Text service. Converting the audio data into text makes it easier to understand for subsequent natural language processing.

[0853] Step 3:

[0854] The server analyzes the text data using Google Cloud Natural Language to extract specific information related to the user's goals and challenges. The output here becomes the data needed to update the user's profile.

[0855] Step 4:

[0856] The server uses OpenCV and TensorFlow to analyze image and video data. Image recognition detects the user's facial expressions and situation, and the Affectiva SDK evaluates the emotion.

[0857] Step 5:

[0858] Based on the evaluated sentiment data, the server uses a generative AI model to generate user-optimized feedback. Information extracted from profile updates is combined with sentiment evaluations to aid in creating descriptive text.

[0859] Step 6:

[0860] The generated feedback is sent to the device through the feedback provision mechanism. The user receives the feedback on the device and reflects it in their actions.

[0861] Step 7:

[0862] The user acts based on the feedback received on their device, and their progress and any new inputs are recorded as they occur. This allows the system to obtain new data for the next cycle.

[0863] Step 8:

[0864] The server analyzes the newly acquired data as a monitoring tool and prepares to make the next coaching session with the user more effective.

[0865] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0866] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0867] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0868] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0869] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0870] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0871] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0872] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0873] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0874] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0875] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0876] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0877] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0878] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0879] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0880] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0881] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0882] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0883] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0884] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0885] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0886] The following is further disclosed regarding the embodiments described above.

[0887] (Claim 1)

[0888] An analytical tool for analyzing information about goals and challenges received from users,

[0889] A profile adjustment means for updating the user's personality and preference profile based on the analyzed information,

[0890] A generation means for generating personalized feedback and advice based on an updated profile,

[0891] A means for providing the generated feedback to the terminal,

[0892] A monitoring system for monitoring user progress based on the provided feedback and analyzing new data,

[0893] A system that includes this.

[0894] (Claim 2)

[0895] The system according to claim 1, further comprising speech recognition means for converting user voice data into text and analyzing the content of the speech using the converted data.

[0896] (Claim 3)

[0897] The system according to claim 1, further comprising image recognition means for analyzing image data and video data received from a user, and further characterized in that the analysis results are used to evaluate the user's performance.

[0898] "Example 1"

[0899] (Claim 1)

[0900] Information analysis means for analyzing information about goals and challenges received from users,

[0901] A data adjustment means for updating user characteristics and preference profiles based on analyzed information,

[0902] A generation means for generating personalized suggestions and guidance based on an updated profile,

[0903] Information transmission means for providing the generated proposal to the terminal,

[0904] A monitoring system for monitoring user progress based on the provided proposals and analyzing new information,

[0905] A data analysis method for evaluating user behavior patterns and environments using multimodal data,

[0906] A system that includes this.

[0907] (Claim 2)

[0908] The system according to claim 1, further comprising a speech conversion means for converting user voice data into text and analyzing the content of the speech using the transcribed information.

[0909] (Claim 3)

[0910] The system according to claim 1, further comprising video analysis means for analyzing image information and video information received from a user, and further characterized in that the analysis results are used to evaluate the user's activity.

[0911] "Application Example 1"

[0912] (Claim 1)

[0913] An analytical means for analyzing information about goals and challenges received from users,

[0914] A profile update means for updating the user's personality and preference profile based on the analyzed information,

[0915] A generation means for generating personalized feedback and advice based on an updated profile,

[0916] A means for providing the generated feedback to the terminal,

[0917] A progress tracking system for monitoring user progress based on the feedback provided and analyzing new data,

[0918] An activity observation method for observing users' daily activities through consumer robots and providing real-time advice,

[0919] A system that includes this.

[0920] (Claim 2)

[0921] The system according to claim 1, further comprising speech recognition means for converting user voice data into text and analyzing the content of speech using the converted data.

[0922] (Claim 3)

[0923] The system according to claim 1, further comprising image analysis means for analyzing still image data and video data received from a user, and further characterized in that the analysis results are used for evaluating the user's skills.

[0924] "Example 2 of combining an emotion engine"

[0925] (Claim 1)

[0926] An information processing device for analyzing information received from a user, comprising means for analyzing the user's input using natural language processing technology and speech recognition technology,

[0927] Based on the analyzed information, an emotion estimation means for evaluating the user's emotional state,

[0928] A generation device for generating personalized feedback and advice based on emotional states evaluated by emotion estimation means and updated profiles,

[0929] Output means for providing the generated feedback to the terminal device,

[0930] A data collection and analysis device for monitoring user progress and analyzing newly received data,

[0931] A system that includes this.

[0932] (Claim 2)

[0933] The system according to claim 1, characterized in that it analyzes the user's voice data and image data and evaluates the user's emotions in real time using emotion estimation means.

[0934] (Claim 3)

[0935] The system according to claim 1, characterized in that it analyzes image and video data received from a user and provides feedback that corresponds to the user's emotional state based on the analysis results.

[0936] "Application example 2 when combining with an emotional engine"

[0937] (Claim 1)

[0938] An analytical means for analyzing information about goals and challenges received from users,

[0939] A profile update means for updating user characteristics and preference profiles based on analyzed information,

[0940] A generation means for generating personalized feedback and advice based on an updated profile,

[0941] A feedback provision means for providing the generated feedback to the terminal,

[0942] A monitoring system for monitoring user performance based on the provided feedback and analyzing new information,

[0943] An emotion evaluation means for evaluating a user's emotional state in real time from audio and visual data and generating emotion-appropriate feedback,

[0944] A system that includes this.

[0945] (Claim 2)

[0946] The system according to claim 1, further comprising a voice recognition means for converting user voice data into text and analyzing the content of the speech using the converted data.

[0947] (Claim 3)

[0948] The system according to claim 1, further comprising image analysis means for analyzing image data and video data received from a user, and further characterized in that the analysis results are used to evaluate the user's behavior. [Explanation of Symbols]

[0949] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. An analytical tool for analyzing information about goals and challenges received from users, A profile adjustment means for updating the user's personality and preference profile based on the analyzed information, A generation means for generating personalized feedback and advice based on an updated profile, A means for providing the generated feedback to the terminal, A monitoring system for monitoring user progress based on the provided feedback and analyzing new data, A system that includes this.

2. The system according to claim 1, further comprising speech recognition means for converting user voice data into text and analyzing the content of the speech using the converted data.

3. The system according to claim 1, further comprising image recognition means for analyzing image data and video data received from a user, and further characterized in that the analysis results are used to evaluate the user's performance.