system
A generative AI model-based system addresses the limitations of conventional coaching by offering personalized, multimodal feedback and continuous support for career improvement, enhancing user progress tracking and emotional awareness.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
Conventional human coaching services are expensive, time-constrained, and lack personalized feedback, making it difficult for business professionals to effectively improve their careers and address diverse challenges without compromising personal privacy.
A system utilizing a generative AI model that processes multimodal data such as text, audio, and video to provide personalized feedback and advice tailored to the user's personality, continuously tracking progress and supporting self-improvement.
Enables efficient and effective career improvement by providing consistent, personalized feedback and support that adapts to the user's unique needs and emotional state, exceeding the capabilities of conventional coaching methods.
Smart Images

Figure 2026101407000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] Modern businesspersons need efficient and effective career improvement means to cope with rapidly changing workplace environments and diverse tasks. However, conventional human coaching services are expensive and there are also time constraints and concerns about personal privacy. In addition, there is a problem that the process of self-improvement cannot be sufficiently advanced because the opportunity to obtain feedback from an objective perspective on one's own goals and issues is limited.
Means for Solving the Problems
[0005] To address these challenges, this invention provides a system that uses a generative AI model to analyze user-inputted goals and challenges, and provides personalized feedback and advice based on that data. This system processes multimodal data such as text, audio, images, and videos, and provides feedback in a communication style tailored to the user's personality. Furthermore, it effectively supports the self-improvement process by tracking the user's progress and providing continuous support.
[0006] "User" refers to an individual or group that uses the system to set their own goals or tasks and aims to achieve them.
[0007] A "goal" refers to a specific outcome or target that a user sets with the aim of achieving it.
[0008] "Challenges" refer to problems or issues that users wish to resolve or improve.
[0009] A "generative artificial intelligence model" is a part of a system that uses AI technology to automatically analyze collected data and generate appropriate output.
[0010] "Data" refers to various forms of information received from users, such as text, audio, images, and videos.
[0011] "Multimodal processing" refers to a technology that integrates and utilizes multiple different types of data (such as text, audio, images, and video).
[0012] "Feedback" refers to evaluations and advice based on analysis results regarding the user's goals and challenges.
[0013] "Advice" refers to specific, actionable suggestions that users can implement to achieve their goals or solve problems.
[0014] "Tracking" refers to the act of continuously monitoring and recording the actions and progress of users.
[0015] "Support" refers to all activities that assist in the process of achieving the goals and tasks set by users.
Brief Explanation of Drawings
[0016] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] [[ID=二十七]]It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which multiple emotions are mapped. [Figure 10] It shows an emotion map to which multiple emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.
Mode for Carrying Out the Invention
[0017] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0018] First, the terms used in the following description will be explained.
[0019] In the following embodiments, a processor with a reference numeral (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of a plurality of arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of a plurality of types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0020] In the following embodiments, a RAM (Random Access Memory) with a reference numeral is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0021] In the following embodiments, a storage with a reference numeral is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0022] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0023] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0024] [First Embodiment]
[0025] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0026] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0027] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0028] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0029] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0030] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0031] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0032] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0033] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0034] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0035] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0036] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0037] This invention is a highly personalized AI coaching system aimed at improving the careers and performance of business professionals. The system receives and analyzes user-defined goals or challenges in various data formats to generate appropriate feedback and advice.
[0038] The device provides users with an intuitive interface where they can input their goals and challenges as text and upload audio, images, and videos as needed. The device then sends this information to the server.
[0039] The server uses a generative artificial intelligence model to analyze the received information. First, the server analyzes text data using natural language processing techniques. This analysis allows for a deep understanding of the user's intent and needs, and the extraction of key points related to the topic. Furthermore, the server processes uploaded audio, image, and video data using speech recognition and image analysis technologies. This enables a multifaceted analysis of the entire material submitted by the user, providing comprehensive feedback.
[0040] Based on the analysis results, the server generates customized feedback and advice that takes into account the user's personality and past behavioral data. This feedback includes specific steps and areas for improvement to help the user achieve their goals. The server also tracks the user's behavior and provides ongoing support for improvement.
[0041] The terminal displays feedback sent from the server to the user. This allows the user to understand their progress and plan their next steps. Through this process, the system is available 24 / 7, 365 days a year, supporting the user's career growth regardless of time or location.
[0042] Thus, the present invention makes it possible to provide users with appropriate career advice with consistent quality. By introducing an AI model that supports diverse data formats, it is expected to yield results exceeding those of conventional human coaching.
[0043] The following describes the processing flow.
[0044] Step 1:
[0045] Users enter their goals and challenges in text format through the terminal interface and upload audio, image, or video files as needed.
[0046] Step 2:
[0047] The terminal sends the information entered by the user to the server. The server receives this information and prepares it for analysis.
[0048] Step 3:
[0049] The server analyzes text data using a natural language processing engine to understand the user's requests and intentions. This process extracts key keywords and context.
[0050] Step 4:
[0051] The server processes the audio data using speech recognition software, converting the user's speech into text data. This allows the information contained in the audio to be used for analysis.
[0052] Step 5:
[0053] The server uses image analysis algorithms to verify image and video data, identifying information derived from visual elements. This data is then integrated with text information for further use.
[0054] Step 6:
[0055] The server analyzes this multimodal data in an integrated manner and compares it with the user's past tracking data to generate personalized feedback and advice.
[0056] Step 7:
[0057] The server sends the generated feedback and advice to the terminal, which then displays it to the user. The user can then review this and plan their next course of action.
[0058] Step 8:
[0059] The server tracks the user's ongoing behavior and periodically updates this data to help with future feedback.
[0060] (Example 1)
[0061] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0062] Traditional coaching systems struggle to provide effective feedback tailored to each user's unique goals and challenges, and they lack the flexibility to analyze diverse data formats other than text (such as audio, images, and videos). Furthermore, they lack the functionality to continuously follow up on users' progress, resulting in insufficient ongoing career support.
[0063] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0064] In this invention, the server includes a device for receiving goals or tasks defined by the user, a device utilizing a generative artificial intelligence model for analyzing information on the goals or tasks, and a device for generating feedback and advice tailored to the user's characteristics based on the analysis. This enables the provision of customized feedback to individual users, flexibly processes diverse data formats, and provides effective career support by continuously tracking the user's progress.
[0065] A "user" is an individual who is supported in achieving goals or solving problems through this system.
[0066] A "device for receiving goals or tasks" is a device that provides an interface for users to input specific goals or tasks they have defined into the system.
[0067] A "generative artificial intelligence model" is an artificial intelligence technology equipped with an algorithm that analyzes input information and generates appropriate feedback and advice.
[0068] An "analysis device" is a device equipped with the function of processing input information such as text, audio, images, and videos, and extracting important elements.
[0069] A "feedback and advice generating device" is a device that creates personalized countermeasures and improvement suggestions in order to provide users with information tailored to their needs based on analysis results.
[0070] "Multi-format data processing" refers to the processing capability to comprehensively analyze data in different formats, such as text, audio, images, and videos, and extract information based on them.
[0071] The "progress tracking and continuous update function" refers to a system function that records user behavior and monitors progress and changes at each stage.
[0072] Modes for carrying out the invention
[0073] The following program and system configurations are possible as embodiments for carrying out this invention.
[0074] System configuration and technologies used:
[0075] The terminal functions as an interface with the user, providing an interface where the user can input their goals and challenges. This input is not limited to text format, but also includes audio, images, and videos. The terminal is equipped with communication devices for sending this data to the server.
[0076] The server plays a central role in processing the received data. The server is equipped with a generative AI model for analysis, applying natural language processing technology to text data, speech recognition technology to audio data, and image analysis technology to image and video data. This allows the server to comprehensively analyze diverse data formats and accurately grasp the user's intent.
[0077] Generating feedback and advice:
[0078] Based on the analysis results, the server generates feedback and advice tailored to the user's characteristics and past behavioral history. This generated information includes specific action steps and improvement suggestions to help the user achieve their goals.
[0079] The device helps users track their progress and plan their next steps by presenting feedback sent from the server.
[0080] Specific examples and prompt statements:
[0081] As a specific scenario, let's assume a user sets the goal of "I want to improve my speaking skills for my next presentation." In this case, the user uploads videos of past presentations, and the server analyzes that data to identify areas for improvement and provide advice on specific practice methods.
[0082] Example of a prompt:
[0083] "I'd like some advice on steps to improve my project management skills."
[0084] "Based on past meeting recordings, I want to understand how we can improve team communication."
[0085] In this way, this invention provides highly personalized career coaching tailored to the individual needs of each user, enabling more effective support than conventional methods.
[0086] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0087] Step 1:
[0088] The user uses the device's interface to input goals and challenges, and uploads audio, images, and videos as needed. This data is sent to the server by the device. During this process, the user inputs the text "I want to learn how to speak effectively in my next presentation" and a video of a past presentation. The entered data is then sent to the server.
[0089] Step 2:
[0090] The server applies natural language processing techniques to analyze the received text data. It utilizes a generative AI model to extract the user's intent from the input text and identify relevant keywords and concepts. In this step, the input is text data, and the output is analyzed semantic information. Specifically, the server extracts keywords such as "speaking style," "presentation," and "effective" from the text and identifies related topics.
[0091] Step 3:
[0092] If audio data is available, the server uses speech recognition technology to convert it into text and then performs further analysis. For image and video data, image analysis technology is used to identify important frames and content. Through these technologies, the output obtained from the input (audio, images, video) is a list or summary of the analyzed content. Specifically, the server captures important scenes from a presentation video and extracts information about gaze and posture.
[0093] Step 4:
[0094] The server integrates all the analyzed information and generates optimal feedback and advice based on the user's characteristics and past behavior history. It utilizes a generative AI model to present customized, specific steps. The input in this step is the integrated analytical information, and the output is the content of the feedback and advice. For example, the server might generate advice such as, "In your next presentation, increase eye contact and maintain a consistent speaking speed."
[0095] Step 5:
[0096] The device presents the user with feedback received from the server and notifies the user for review. The outputted feedback serves as concrete guidance for the user's actions. Specifically, the device displays the feedback content on the screen and sends a notification to the user, helping them plan their next steps.
[0097] (Application Example 1)
[0098] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0099] Business professionals often struggle to obtain helpful feedback for improving their careers and performance, without being constrained by time or location. Furthermore, traditional human coaching has limitations and struggles to fully address individual needs. Therefore, there is a need for continuous and personalized support provided in a home setting.
[0100] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0101] In this invention, the server includes means for receiving goals or tasks defined by the user, means for utilizing a generation algorithm for analyzing the data of the goals or tasks, and means for generating feedback and advice tailored to the user's characteristics based on the analysis. This enables the user to receive customized career support 24 hours a day, 365 days a year through a home device.
[0102] A "user" is an individual or group that aims to achieve their goals or objectives using this system.
[0103] "Goals or challenges" refer to specific objectives or problems that users wish to achieve or resolve, and are the subjects for which support is provided by this system.
[0104] "Household machines" refer to robots and devices that are used on a daily basis within the home and that play a role in communication and information gathering.
[0105] A "generative algorithm" is a computational method or program that analyzes received data and provides the user with the most appropriate feedback or advice.
[0106] "Integrated processing" is an information processing method that simultaneously analyzes multiple data formats and makes decisions by relating them to each other.
[0107] "Feedback and advice" refers to specific directions and instructions provided to help users achieve their goals and solve problems.
[0108] "Progress" refers to the state or degree to which a user is making progress toward achieving their goals or tasks.
[0109] This invention constructs a system in which a home appliance collects information and transmits it to a server in order to achieve goals and tasks set by the user. The home appliance used as a terminal is equipped with a voice recognition microphone and a camera, and can receive voice input and visual information from the user. The home appliance allows the user to input information about their goals and tasks with simple operation and transmits this information to the server.
[0110] The server uses the Google® Speech-to-Text API to convert speech data into text data and performs natural language processing using OpenAI® generative AI models to analyze the received information. Furthermore, it employs a generative artificial intelligence model that comprehensively analyzes the received data using TENSORFLOW® and OpenCV for image analysis. The server generates feedback and advice based on the user's characteristics and provides this to the user via a home device.
[0111] As a concrete example, consider a scenario where a home robot is asked by a user, "What skills should I learn for my next career step?" The user provides information to the robot by explaining their current situation and talking about areas of interest. The robot sends this information to a server, which then provides advice based on its analysis.
[0112] Examples of prompts to input into a generative AI model:
[0113] "Based on the user's stated career interests and current skill set, generate a list of technologies and skills they should acquire next."
[0114] This system will allow users to easily receive career support in the comfort of their own homes.
[0115] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0116] Step 1:
[0117] The terminal, acting as a home device, receives information about the user's goals and challenges as audio or visual information. The input information is captured by the terminal as audio or image data. The home device organizes this data and prepares it for transmission to the server.
[0118] Step 2:
[0119] The server converts the audio data received from the terminal into text data using the Google Speech-to-Text API. It analyzes the input audio data and outputs natural language text as a result. This process transforms audio information into text information.
[0120] Step 3:
[0121] The server inputs text data into a generative AI model, which then analyzes the text content using natural language processing. The generative AI model extracts the user's intent and needs from the text information and highlights relevant key points. Based on this, a basic understanding of how to achieve the user's goals is obtained.
[0122] Step 4:
[0123] The server analyzes the received image data using TensorFlow and OpenCV. By extracting visual elements from the image data and identifying related information, it analyzes the user's provided information from multiple perspectives. This process allows the server to understand the user's state and required resources from the images.
[0124] Step 5:
[0125] The server generates feedback and advice based on the analyzed text and image data. Using a generative AI model, customized feedback is provided, taking into account the user's characteristics and past performance. This output is presented as specific advice.
[0126] Step 6:
[0127] The terminal displays feedback and advice received from the server to the user. This information is displayed via voice and on the screen to help the user plan their next course of action.
[0128] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0129] This invention is an AI-based coaching system that supports the career and performance improvement of business professionals, and is characterized in that it incorporates an emotion recognition engine to provide feedback that is tailored to the user's emotional state. The system receives input from the user and generates personalized advice based on that input.
[0130] The device provides an interface for users to input their set goals and tasks as text. Users can also upload audio recordings of their own voice and videos such as video conference recordings. The device then sends this data to the server.
[0131] The server first analyzes the received data. A generative artificial intelligence model analyzes the text data using natural language processing techniques to clarify the user's goals. Simultaneously, the server activates an emotion engine to determine the user's emotional state from audio and video data. This emotion data is used to enhance the sensitivity of the feedback provided and to generate more appropriate content for the user.
[0132] As a concrete example, consider a scenario where a user sets a goal of overcoming presentation anxiety and uploads past presentation videos as relevant materials. The server uses an emotion engine to detect the user's level of tension from these videos and generates feedback that includes specific breathing techniques and mental exercises to alleviate the tension.
[0133] The generated feedback and advice are presented to the user visually and audibly via the device. This allows the user to receive continuously optimized support while adjusting their own actions in real time.
[0134] By implementing this invention, the system provides more effective and personalized business coaching that also takes into account the user's emotions. This goes beyond simply achieving goals and contributes to improving the user's ability to cope with daily tasks and pressures.
[0135] The following describes the processing flow.
[0136] Step 1:
[0137] Users input their goals and challenges in text format through the terminal interface. They can also upload audio data and video files as needed.
[0138] Step 2:
[0139] The terminal sends the data entered and uploaded by the user to the server. The server receives this information and prepares it for analysis.
[0140] Step 3:
[0141] The server uses a generative artificial intelligence model to analyze text data, understand user requests, and extract key information related to the goals.
[0142] Step 4:
[0143] The server processes uploaded audio and video data using an emotion engine, analyzing the user's emotional state from these media. For example, it recognizes the user's emotions based on changes in voice tone and facial expressions.
[0144] Step 5:
[0145] The server integrates text analysis results and sentiment recognition results to generate personalized feedback and advice based on the user's personality traits and emotional state.
[0146] Step 6:
[0147] The server sends the generated feedback and advice to the device, which then presents them to the user visually and audibly. Through the presented content, the user can follow emotionally conscious and practical advice.
[0148] Step 7:
[0149] The server frequently monitors the user's progress and emotional changes, accumulating feedback data to use in future analyses. This provides continuous learning and support.
[0150] (Example 2)
[0151] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0152] Many business professionals want to receive various forms of feedback and guidance for self-improvement and to improve work efficiency. However, conventional systems lack the flexibility to appropriately respond to users' emotional states and individual goals, making it difficult to provide personalized and effective support. This invention aims to solve these problems and enable users to achieve their goals in the most optimal way while taking their emotional state into consideration.
[0153] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0154] In this invention, the server includes a component that receives a purpose or task defined by the user, a processing unit that utilizes an artificial intelligence model to analyze information about the purpose or task, and a device that generates instructions and advice tailored to the user's characteristics based on the analysis. This allows the user to instantly receive personalized feedback and flexibly and effectively utilize that guidance through emotion recognition.
[0155] A "user" is an individual or group that uses the system to achieve their goals or accomplish their objectives.
[0156] "Objective or problem" refers to a specific outcome or solution that the user hopes to achieve with the system.
[0157] A "component" is a part of the system that receives the objectives or tasks set by the user.
[0158] "Information" is a collection of data that includes text, audio, visual information, and video.
[0159] An "artificial intelligence model" is a set of algorithms used to analyze collected information and generate instructions and advice to achieve a specific goal.
[0160] A "processing device" is a part of a system that analyzes information and generates instructions and advice tailored to the user's characteristics.
[0161] "Instructions and advice" refers to specific suggestions and guidance provided by the artificial intelligence model to help users achieve their goals.
[0162] "Device" refers to a part of a system that provides the aforementioned instructions and advice to the user and enables the user to utilize that information.
[0163] This system aims to provide advanced coaching using artificial intelligence technology to promote user growth and improve efficiency. Its main components include terminals, servers, and generative AI models.
[0164] The device provides an interface for users to input their goals and challenges. Users can directly input text information or upload audio and video data. For example, a user might set the goal of "improving their leadership skills" and upload recordings of past leadership-related meetings.
[0165] Information received from the device is sent to the server. To analyze this information, the server uses a generative AI model to analyze text data and an emotion recognition engine to detect emotional states from audio and video data. Specifically, natural language processing technology is used to analyze goals and tasks in the text, and speech analysis technology is used to identify emotional states. This generates instructions and advice related to the user's goals, and provides feedback to the user.
[0166] The generated feedback is presented to the user visually and audibly via the device. Users can receive this feedback in real time and efficiently adjust their actions.
[0167] As an example of a prompt, if you input the text "I want to know how to lead my team more efficiently on the next project" into the AI model, the system can provide effective advice and plans. This system allows users to receive support tailored to their individual needs and to respond flexibly while taking their emotional state into consideration.
[0168] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0169] Step 1:
[0170] Users input their goals and challenges through their device, and upload audio and video data as needed. For example, a user might input "I want to improve my negotiation skills" and provide recordings of past negotiations. The input data is in text and multimedia formats. This constitutes the initial input to the system.
[0171] Step 2:
[0172] The terminal sends input data received from the user to the server. The data is transferred using a secure communication protocol. The server receives text data related to the user's goals and challenges, as well as audio and video data. This prepares the raw data necessary for analysis.
[0173] Step 3:
[0174] The server inputs the received text data into a generating AI model, which then analyzes it through natural language processing. This data processing includes grammatical analysis and keyword extraction. The output provides a concrete understanding and analysis of the user's goals and challenges.
[0175] Step 4:
[0176] The server inputs audio and video data into an emotion recognition engine for analysis. For audio data, it analyzes voice tone, pitch, tempo, etc., to determine the user's emotional state. For video data, it uses facial expression recognition technology. The output is the analyzed emotion data, indicating the user's current emotional state.
[0177] Step 5:
[0178] The server integrates the text data analysis results from the generation AI model with the emotion data from the emotion recognition engine to generate user-appropriate feedback and advice. Here, the inference engine is utilized to generate individual instructions and suggestions. The output is feedback that includes specific action guidelines.
[0179] Step 6:
[0180] The terminal visually and audibly presents instructions and advice received from the server to the user. Specifically, it displays visualized data on a dashboard and provides audio guidance. This allows the user to immediately receive support from the system and take action.
[0181] (Application Example 2)
[0182] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0183] In modern society, business professionals and ordinary citizens frequently face mental stress in their daily lives and work. In this environment, there is a need to provide personalized feedback that takes individual emotional states into account. However, existing systems struggle to provide timely and optimal feedback tailored to individual emotional states, and there is a lack of mechanisms for users to receive immediate emotional support. Furthermore, proposing solutions that utilize local resources is also difficult.
[0184] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0185] In this invention, the server includes means for receiving the user's goals and challenges, means for analyzing the data using a generative information processing model, and means for recognizing the user's emotional state. This makes it possible to adaptively optimize feedback and advice tailored to the user's characteristics and emotions, taking into account local events and activities.
[0186] A "user" is an individual or group that uses this system, has goals and challenges, and requires feedback and advice regarding them.
[0187] A "goal or challenge" is a specific matter that the user aims to achieve or resolve, and it forms the basis for feedback and advice provided by this system.
[0188] A "generative information processing model" is an artificial intelligence technology that analyzes diverse data formats and generates feedback tailored to the user's goals and challenges.
[0189] "Emotion recognition means" refers to technology that determines a user's emotional state from data such as audio and images, and contributes to optimizing feedback.
[0190] "Feedback and advice" refers to the evaluations and suggestions that a generative information processing model provides to a user's goals and challenges, serving as a guide for the user to adjust their actions toward achieving those goals.
[0191] An "information terminal" is a device that provides users with feedback and advice visually or audibly, and includes smartphones and smart glasses.
[0192] "Geographic information" refers to location-related data and is used when suggesting local events and activities to users.
[0193] "Local events and activities" refer to nearby activities and gatherings that users can participate in, and are proposed to support users in reducing stress and achieving their goals.
[0194] As an embodiment of the present invention, a personalized support system for citizens in a smart city environment will be described. This system is implemented using a smartphone or smart glasses owned by the user. When the user experiences stress or challenges in their daily life, the device captures the user's voice and facial expressions.
[0195] The terminal sends this input data to the server, which processes it via an emotion recognition API. Specifically, it uses the Microsoft® Azure® emotion recognition API to determine the user's emotional state. The server then uses the OpenAI GPT generative artificial intelligence model to generate advice tailored to the user's current emotional state. This advice may include suggestions for relaxation methods or local events to lift one's spirits.
[0196] Furthermore, by utilizing geographic information APIs and using the Google Maps API to search for and suggest local events and resources based on the user's location, we are able to provide more realistic and responsive support.
[0197] For example, if a user complains of stress at work, the smart glasses analyze the user's facial expression using an emotion recognition API and identify the emotion of anxiety. Based on this, the server uses a generative AI model to recommend "participating in a yoga event at a nearby park" and provides voice guidance for "deep breathing exercises."
[0198] This system takes a prompt example, "If the user's emotional state is anxious, suggest three relaxation methods," and the corresponding feedback is generated by an artificial intelligence model. This allows users to instantly receive effective feedback tailored to their own emotional state.
[0199] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0200] Step 1:
[0201] When a user experiences stress, the device captures their voice and facial expressions. The input consists of the user's voice data and image data from the camera, which are temporarily stored on the device. The device performs noise filtering to improve the accuracy of the data.
[0202] Step 2:
[0203] The device sends the captured audio and image data to a server in the cloud. As output, a data packet is generated. The device encrypts the data using security protocols to protect user privacy.
[0204] Step 3:
[0205] The server calls an emotion recognition API to analyze the data it receives. The input is the user's voice and image data, and the API extracts emotion information from these. The output is the identified emotional state (e.g., anxiety, joy). Based on this, the server logs the user's emotional state.
[0206] Step 4:
[0207] The server uses the emotion recognition results to input a prompt message into the generating AI model. An example prompt message used is, "If the user's emotional state is anxious, suggest three relaxation methods." The input is the identified emotional state, and the model generates feedback based on this.
[0208] Step 5:
[0209] The server integrates the generated feedback with activity and event information based on the user's geographic location, utilizing local information services. The input is the user's location data and generated feedback, and the output is a realistic suggestions for events. The server uses the Google Maps API to retrieve event information.
[0210] Step 6:
[0211] The server sends integrated feedback and event information to the terminal. The output is feedback data presented to the user. The terminal presents this information visually and audibly, allowing the user to adjust their actions based on it.
[0212] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0213] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0214] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0215] [Second Embodiment]
[0216] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0217] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0218] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0219] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0220] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0221] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0222] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0223] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0224] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0225] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0226] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0227] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0228] This invention is a highly personalized AI coaching system aimed at improving the careers and performance of business professionals. The system receives and analyzes user-defined goals or challenges in various data formats to generate appropriate feedback and advice.
[0229] The device provides users with an intuitive interface where they can input their goals and challenges as text and upload audio, images, and videos as needed. The device then sends this information to the server.
[0230] The server uses a generative artificial intelligence model to analyze the received information. First, the server analyzes text data using natural language processing techniques. This analysis allows for a deep understanding of the user's intent and needs, and the extraction of key points related to the topic. Furthermore, the server processes uploaded audio, image, and video data using speech recognition and image analysis technologies. This enables a multifaceted analysis of the entire material submitted by the user, providing comprehensive feedback.
[0231] Based on the analysis results, the server generates customized feedback and advice that takes into account the user's personality and past behavioral data. This feedback includes specific steps and areas for improvement to help the user achieve their goals. The server also tracks the user's behavior and provides ongoing support for improvement.
[0232] The terminal displays feedback sent from the server to the user. This allows the user to understand their progress and plan their next steps. Through this process, the system is available 24 / 7, 365 days a year, supporting the user's career growth regardless of time or location.
[0233] Thus, the present invention makes it possible to provide users with appropriate career advice with consistent quality. By introducing an AI model that supports diverse data formats, it is expected to yield results exceeding those of conventional human coaching.
[0234] The following describes the processing flow.
[0235] Step 1:
[0236] Users enter their goals and challenges in text format through the terminal interface and upload audio, image, or video files as needed.
[0237] Step 2:
[0238] The terminal sends the information entered by the user to the server. The server receives this information and prepares it for analysis.
[0239] Step 3:
[0240] The server analyzes text data using a natural language processing engine to understand the user's requests and intentions. This process extracts key keywords and context.
[0241] Step 4:
[0242] The server processes the audio data using speech recognition software, converting the user's speech into text data. This allows the information contained in the audio to be used for analysis.
[0243] Step 5:
[0244] The server uses image analysis algorithms to verify image and video data, identifying information derived from visual elements. This data is then integrated with text information for further use.
[0245] Step 6:
[0246] The server analyzes this multimodal data in an integrated manner and compares it with the user's past tracking data to generate personalized feedback and advice.
[0247] Step 7:
[0248] The server sends the generated feedback and advice to the terminal, which then displays it to the user. The user can then review this and plan their next course of action.
[0249] Step 8:
[0250] The server tracks the user's ongoing behavior and periodically updates this data to help with future feedback.
[0251] (Example 1)
[0252] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0253] Traditional coaching systems struggle to provide effective feedback tailored to each user's unique goals and challenges, and they lack the flexibility to analyze diverse data formats other than text (such as audio, images, and videos). Furthermore, they lack the functionality to continuously follow up on users' progress, resulting in insufficient ongoing career support.
[0254] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0255] In this invention, the server includes a device for receiving goals or tasks defined by the user, a device utilizing a generative artificial intelligence model for analyzing information on the goals or tasks, and a device for generating feedback and advice tailored to the user's characteristics based on the analysis. This enables the provision of customized feedback to individual users, flexibly processes diverse data formats, and provides effective career support by continuously tracking the user's progress.
[0256] A "user" is an individual who is supported in achieving goals or solving problems through this system.
[0257] A "device for receiving goals or tasks" is a device that provides an interface for users to input specific goals or tasks they have defined into the system.
[0258] A "generative artificial intelligence model" is an artificial intelligence technology equipped with an algorithm that analyzes input information and generates appropriate feedback and advice.
[0259] An "analysis device" is a device equipped with the function of processing input information such as text, audio, images, and videos, and extracting important elements.
[0260] A "feedback and advice generating device" is a device that creates personalized countermeasures and improvement suggestions in order to provide users with information tailored to their needs based on analysis results.
[0261] "Multi-format data processing" refers to the processing capability to comprehensively analyze data in different formats, such as text, audio, images, and videos, and extract information based on them.
[0262] The "progress tracking and continuous update function" refers to a system function that records user behavior and monitors progress and changes at each stage.
[0263] Modes for carrying out the invention
[0264] The following program and system configurations are possible as embodiments for carrying out this invention.
[0265] System configuration and technologies used:
[0266] The terminal functions as an interface with the user, providing an interface where the user can input their goals and challenges. This input is not limited to text format, but also includes audio, images, and videos. The terminal is equipped with communication devices for sending this data to the server.
[0267] The server plays a central role in processing the received data. The server is equipped with a generative AI model for analysis, applying natural language processing technology to text data, speech recognition technology to audio data, and image analysis technology to image and video data. This allows the server to comprehensively analyze diverse data formats and accurately grasp the user's intent.
[0268] Generating feedback and advice:
[0269] Based on the analysis results, the server generates feedback and advice tailored to the user's characteristics and past behavioral history. This generated information includes specific action steps and improvement suggestions to help the user achieve their goals.
[0270] The device helps users track their progress and plan their next steps by presenting feedback sent from the server.
[0271] Specific examples and prompt statements:
[0272] As a specific scenario, let's assume a user sets the goal of "I want to improve my speaking skills for my next presentation." In this case, the user uploads videos of past presentations, and the server analyzes that data to identify areas for improvement and provide advice on specific practice methods.
[0273] Example of a prompt:
[0274] "I'd like some advice on steps to improve my project management skills."
[0275] "Based on past meeting recordings, I want to understand how we can improve team communication."
[0276] In this way, this invention provides highly personalized career coaching tailored to the individual needs of each user, enabling more effective support than conventional methods.
[0277] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0278] Step 1:
[0279] The user uses the interface of the terminal to input goals and tasks, and uploads voice, images, and videos as needed. These data are sent by the terminal to the server. In this process, the user inputs the text "want to acquire effective speaking skills in the next presentation" and a past presentation video. The input data is sent to the server.
[0280] Step 2:
[0281] The server applies natural language processing technology to analyze the received text data. Utilizing a generative AI model, it extracts the user's intention from the input text and identifies relevant keywords and concepts. The input in this step is text data, and the output is the analyzed semantic information. As a specific operation, the server extracts keywords such as "speaking skills", "presentation", and "effective" from the text and identifies relevant topics.
[0282] Step 3:
[0283] If there is voice data, the server converts the voice data into text by voice recognition technology and performs further analysis. For image and video data, image analysis technology is used to identify important frames and content. Through these technologies, the output obtained from the input (voice, image, video) is a list of analyzed content or summary information. Specifically, the server captures important scenes of the presentation video and extracts information related to the line of sight and posture.
[0284] Step 4:
[0285] The server integrates all the analyzed information and generates optimal feedback and advice based on the user's characteristics and past behavior history. It utilizes a generative AI model to present customized specific steps. The input in this step is the integrated analyzed information, and the output is the content of the feedback and advice. As a specific operation, the server generates advice such as "increase eye contact and maintain a constant speaking speed in the next presentation".
[0286] Step 5:
[0287] The terminal presents the feedback received from the server to the user and notifies the user so that the user can confirm it. The output feedback serves as a specific action guideline for the user. As a specific operation, the terminal displays the feedback content on the screen and sends a notification to the user to assist the user in planning the next step.
[0288] (Application Example 1)
[0289] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".
[0290] It is difficult for businesspersons to obtain feedback that is useful for improving their careers and performance without being restricted by time and location. Also, traditional human coaching has limitations and it is difficult to fully meet individual needs. Therefore, there is a demand for providing continuous and personalized support in a home environment.
[0291] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following respective means.
[0292] In this invention, the server includes means for receiving goals or tasks defined by the user, means for utilizing a generation algorithm for analyzing the data of the goals or tasks, and means for generating feedback and advice tailored to the user's characteristics based on the analysis. This enables the user to receive customized career support 24 hours a day, 365 days a year through a home device.
[0293] A "user" is an individual or group that aims to achieve their goals or objectives using this system.
[0294] "Goals or challenges" refer to specific objectives or problems that users wish to achieve or resolve, and are the subjects for which support is provided by this system.
[0295] "Household machines" refer to robots and devices that are used on a daily basis within the home and that play a role in communication and information gathering.
[0296] A "generative algorithm" is a computational method or program that analyzes received data and provides the user with the most appropriate feedback or advice.
[0297] "Integrated processing" is an information processing method that simultaneously analyzes multiple data formats and makes decisions by relating them to each other.
[0298] "Feedback and advice" refers to specific directions and instructions provided to help users achieve their goals and solve problems.
[0299] "Progress" refers to the state or degree to which a user is making progress toward achieving their goals or tasks.
[0300] In the present invention, a system is constructed in which a household machine collects information and transmits it to a server in order to achieve the goals and tasks set by the user. The household machine used as a terminal is equipped with a voice recognition microphone and a camera, and can receive voice input and visual information from the user. The household machine enables the input of information on the user's goals and tasks through simple operations, and transmits this information to the server.
[0301] In order to analyze the received information, the server uses the Google Speech-to-Text API to convert voice data into character data and performs natural language processing using the generative AI model of OpenAI. Furthermore, TensorFlow and OpenCV are used for image analysis, and a generative artificial intelligence model that integratively analyzes the received data is used. The server generates feedback and advice based on the user's characteristics and provides this to the user via the household machine.
[0302] As a specific example, consider the case where a household robot asks the user "What skills should I learn for the next career step?". The user provides information by explaining the current situation to the robot and talking about areas of interest. The robot transmits that information to the server, and the server provides advice based on the analysis results.
[0303] Example of a prompt sentence to input into the generative AI model:
[0304] "Based on the user's stated career interests and current skill set, generate a list of technologies and skills to acquire next."
[0305] With this system, users can easily receive career support in a home environment.
[0306] The flow of specific processing in Application Example 1 will be described using FIG. 12.
[0307] Step 1:
[0308] The terminal, acting as a home device, receives information about the user's goals and challenges as audio or visual information. The input information is captured by the terminal as audio or image data. The home device organizes this data and prepares it for transmission to the server.
[0309] Step 2:
[0310] The server converts the audio data received from the terminal into text data using the Google Speech-to-Text API. It analyzes the input audio data and outputs natural language text as a result. This process transforms audio information into text information.
[0311] Step 3:
[0312] The server inputs text data into a generative AI model, which then analyzes the text content using natural language processing. The generative AI model extracts the user's intent and needs from the text information and highlights relevant key points. Based on this, a basic understanding of how to achieve the user's goals is obtained.
[0313] Step 4:
[0314] The server analyzes the received image data using TensorFlow and OpenCV. By extracting visual elements from the image data and identifying related information, it analyzes the user's provided information from multiple perspectives. This process allows the server to understand the user's state and required resources from the images.
[0315] Step 5:
[0316] The server generates feedback and advice based on the analyzed text and image data. Using a generative AI model, customized feedback is provided, taking into account the user's characteristics and past performance. This output is presented as specific advice.
[0317] Step 6:
[0318] The terminal displays feedback and advice received from the server to the user. This information is displayed via voice and on the screen to help the user plan their next course of action.
[0319] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0320] This invention is an AI-based coaching system that supports the career and performance improvement of business professionals, and is characterized in that it incorporates an emotion recognition engine to provide feedback that is tailored to the user's emotional state. The system receives input from the user and generates personalized advice based on that input.
[0321] The device provides an interface for users to input their set goals and tasks as text. Users can also upload audio recordings of their own voice and videos such as video conference recordings. The device then sends this data to the server.
[0322] The server first analyzes the received data. A generative artificial intelligence model analyzes the text data using natural language processing techniques to clarify the user's goals. Simultaneously, the server activates an emotion engine to determine the user's emotional state from audio and video data. This emotion data is used to enhance the sensitivity of the feedback provided and to generate more appropriate content for the user.
[0323] As a concrete example, consider a scenario where a user sets a goal of overcoming presentation anxiety and uploads past presentation videos as relevant materials. The server uses an emotion engine to detect the user's level of tension from these videos and generates feedback that includes specific breathing techniques and mental exercises to alleviate the tension.
[0324] The generated feedback and advice are presented to the user visually and audibly via the device. This allows the user to receive continuously optimized support while adjusting their own actions in real time.
[0325] By implementing this invention, the system provides more effective and personalized business coaching that also takes into account the user's emotions. This goes beyond simply achieving goals and contributes to improving the user's ability to cope with daily tasks and pressures.
[0326] The following describes the processing flow.
[0327] Step 1:
[0328] Users input their goals and challenges in text format through the terminal interface. They can also upload audio data and video files as needed.
[0329] Step 2:
[0330] The terminal sends the data entered and uploaded by the user to the server. The server receives this information and prepares it for analysis.
[0331] Step 3:
[0332] The server uses a generative artificial intelligence model to analyze text data, understand user requests, and extract key information related to the goals.
[0333] Step 4:
[0334] The server processes uploaded audio and video data using an emotion engine, analyzing the user's emotional state from these media. For example, it recognizes the user's emotions based on changes in voice tone and facial expressions.
[0335] Step 5:
[0336] The server integrates text analysis results and sentiment recognition results to generate personalized feedback and advice based on the user's personality traits and emotional state.
[0337] Step 6:
[0338] The server sends the generated feedback and advice to the device, which then presents them to the user visually and audibly. Through the presented content, the user can follow emotionally conscious and practical advice.
[0339] Step 7:
[0340] The server frequently monitors the user's progress and emotional changes, accumulating feedback data to use in future analyses. This provides continuous learning and support.
[0341] (Example 2)
[0342] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0343] Many business professionals want to receive various forms of feedback and guidance for self-improvement and to improve work efficiency. However, conventional systems lack the flexibility to appropriately respond to users' emotional states and individual goals, making it difficult to provide personalized and effective support. This invention aims to solve these problems and enable users to achieve their goals in the most optimal way while taking their emotional state into consideration.
[0344] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0345] In this invention, the server includes a component that receives a purpose or task defined by the user, a processing unit that utilizes an artificial intelligence model to analyze information about the purpose or task, and a device that generates instructions and advice tailored to the user's characteristics based on the analysis. This allows the user to instantly receive personalized feedback and flexibly and effectively utilize that guidance through emotion recognition.
[0346] A "user" is an individual or group that uses the system to achieve their goals or accomplish their objectives.
[0347] "Objective or problem" refers to a specific outcome or solution that the user hopes to achieve with the system.
[0348] A "component" is a part of the system that receives the objectives or tasks set by the user.
[0349] "Information" is a collection of data that includes text, audio, visual information, and video.
[0350] An "artificial intelligence model" is a set of algorithms used to analyze collected information and generate instructions and advice to achieve a specific goal.
[0351] A "processing device" is a part of a system that analyzes information and generates instructions and advice tailored to the user's characteristics.
[0352] "Instructions and advice" refers to specific suggestions and guidance provided by the artificial intelligence model to help users achieve their goals.
[0353] "Device" refers to a part of a system that provides the aforementioned instructions and advice to the user and enables the user to utilize that information.
[0354] This system aims to provide advanced coaching using artificial intelligence technology to promote user growth and improve efficiency. Its main components include terminals, servers, and generative AI models.
[0355] The device provides an interface for users to input their goals and challenges. Users can directly input text information or upload audio and video data. For example, a user might set the goal of "improving their leadership skills" and upload recordings of past leadership-related meetings.
[0356] Information received from the device is sent to the server. To analyze this information, the server uses a generative AI model to analyze text data and an emotion recognition engine to detect emotional states from audio and video data. Specifically, natural language processing technology is used to analyze goals and tasks in the text, and speech analysis technology is used to identify emotional states. This generates instructions and advice related to the user's goals, and provides feedback to the user.
[0357] The generated feedback is presented to the user visually and audibly via the device. Users can receive this feedback in real time and efficiently adjust their actions.
[0358] As an example of a prompt, if you input the text "I want to know how to lead my team more efficiently on the next project" into the AI model, the system can provide effective advice and plans. This system allows users to receive support tailored to their individual needs and to respond flexibly while taking their emotional state into consideration.
[0359] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0360] Step 1:
[0361] Users input their goals and challenges through their device, and upload audio and video data as needed. For example, a user might input "I want to improve my negotiation skills" and provide recordings of past negotiations. The input data is in text and multimedia formats. This constitutes the initial input to the system.
[0362] Step 2:
[0363] The terminal sends input data received from the user to the server. The data is transferred using a secure communication protocol. The server receives text data related to the user's goals and challenges, as well as audio and video data. This prepares the raw data necessary for analysis.
[0364] Step 3:
[0365] The server inputs the received text data into a generating AI model, which then analyzes it through natural language processing. This data processing includes grammatical analysis and keyword extraction. The output provides a concrete understanding and analysis of the user's goals and challenges.
[0366] Step 4:
[0367] The server inputs audio and video data into an emotion recognition engine for analysis. For audio data, it analyzes voice tone, pitch, tempo, etc., to determine the user's emotional state. For video data, it uses facial expression recognition technology. The output is the analyzed emotion data, indicating the user's current emotional state.
[0368] Step 5:
[0369] The server integrates the text data analysis results from the generation AI model with the emotion data from the emotion recognition engine to generate user-appropriate feedback and advice. Here, the inference engine is utilized to generate individual instructions and suggestions. The output is feedback that includes specific action guidelines.
[0370] Step 6:
[0371] The terminal visually and audibly presents instructions and advice received from the server to the user. Specifically, it displays visualized data on a dashboard and provides audio guidance. This allows the user to immediately receive support from the system and take action.
[0372] (Application Example 2)
[0373] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0374] In modern society, business professionals and ordinary citizens frequently face mental stress in their daily lives and work. In this environment, there is a need to provide personalized feedback that takes individual emotional states into account. However, existing systems struggle to provide timely and optimal feedback tailored to individual emotional states, and there is a lack of mechanisms for users to receive immediate emotional support. Furthermore, proposing solutions that utilize local resources is also difficult.
[0375] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0376] In this invention, the server includes means for receiving the user's goals and challenges, means for analyzing the data using a generative information processing model, and means for recognizing the user's emotional state. This makes it possible to adaptively optimize feedback and advice tailored to the user's characteristics and emotions, taking into account local events and activities.
[0377] A "user" is an individual or group that uses this system, has goals and challenges, and requires feedback and advice regarding them.
[0378] A "goal or challenge" is a specific matter that the user aims to achieve or resolve, and it forms the basis for feedback and advice provided by this system.
[0379] A "generative information processing model" is an artificial intelligence technology that analyzes diverse data formats and generates feedback tailored to the user's goals and challenges.
[0380] "Emotion recognition means" refers to technology that determines a user's emotional state from data such as audio and images, and contributes to optimizing feedback.
[0381] "Feedback and advice" refers to the evaluations and suggestions that a generative information processing model provides to a user's goals and challenges, serving as a guide for the user to adjust their actions toward achieving those goals.
[0382] An "information terminal" is a device that provides users with feedback and advice visually or audibly, and includes smartphones and smart glasses.
[0383] "Geographic information" refers to location-related data and is used when suggesting local events and activities to users.
[0384] "Local events and activities" refer to nearby activities and gatherings that users can participate in, and are proposed to support users in reducing stress and achieving their goals.
[0385] As an embodiment of the present invention, a personalized support system for citizens in a smart city environment will be described. This system is implemented using a smartphone or smart glasses owned by the user. When the user experiences stress or challenges in their daily life, the device captures the user's voice and facial expressions.
[0386] The device sends this input data to the server, which processes it via an emotion recognition API. Specifically, it uses Microsoft Azure's emotion recognition API to determine the user's emotional state. The server then uses OpenAI GPT, a generative artificial intelligence model, to generate advice tailored to the user's current emotional state. This advice may include suggestions for relaxation methods or local events to lift their spirits.
[0387] Furthermore, by utilizing geographic information APIs and using the Google Maps API to search for and suggest local events and resources based on the user's location, we are able to provide more realistic and responsive support.
[0388] For example, if a user complains of stress at work, the smart glasses analyze the user's facial expression using an emotion recognition API and identify the emotion of anxiety. Based on this, the server uses a generative AI model to recommend "participating in a yoga event at a nearby park" and provides voice guidance for "deep breathing exercises."
[0389] This system takes a prompt example, "If the user's emotional state is anxious, suggest three relaxation methods," and the corresponding feedback is generated by an artificial intelligence model. This allows users to instantly receive effective feedback tailored to their own emotional state.
[0390] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0391] Step 1:
[0392] When a user experiences stress, the device captures their voice and facial expressions. The input consists of the user's voice data and image data from the camera, which are temporarily stored on the device. The device performs noise filtering to improve the accuracy of the data.
[0393] Step 2:
[0394] The device sends the captured audio and image data to a server in the cloud. As output, a data packet is generated. The device encrypts the data using security protocols to protect user privacy.
[0395] Step 3:
[0396] The server calls an emotion recognition API to analyze the data it receives. The input is the user's voice and image data, and the API extracts emotion information from these. The output is the identified emotional state (e.g., anxiety, joy). Based on this, the server logs the user's emotional state.
[0397] Step 4:
[0398] The server uses the emotion recognition results to input a prompt message into the generating AI model. An example prompt message used is, "If the user's emotional state is anxious, suggest three relaxation methods." The input is the identified emotional state, and the model generates feedback based on this.
[0399] Step 5:
[0400] The server integrates the generated feedback with activity and event information based on the user's geographic location, utilizing local information services. The input is the user's location data and generated feedback, and the output is a realistic suggestions for events. The server uses the Google Maps API to retrieve event information.
[0401] Step 6:
[0402] The server sends integrated feedback and event information to the terminal. The output is feedback data presented to the user. The terminal presents this information visually and audibly, allowing the user to adjust their actions based on it.
[0403] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0404] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0405] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0406] [Third Embodiment]
[0407] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0408] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0409] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0410] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0411] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0412] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0413] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0414] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0415] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0416] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0417] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0418] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0419] This invention is a highly personalized AI coaching system aimed at improving the careers and performance of business professionals. The system receives and analyzes user-defined goals or challenges in various data formats to generate appropriate feedback and advice.
[0420] The device provides users with an intuitive interface where they can input their goals and challenges as text and upload audio, images, and videos as needed. The device then sends this information to the server.
[0421] The server uses a generative artificial intelligence model to analyze the received information. First, the server analyzes text data using natural language processing techniques. This analysis allows for a deep understanding of the user's intent and needs, and the extraction of key points related to the topic. Furthermore, the server processes uploaded audio, image, and video data using speech recognition and image analysis technologies. This enables a multifaceted analysis of the entire material submitted by the user, providing comprehensive feedback.
[0422] Based on the analysis results, the server generates customized feedback and advice that takes into account the user's personality and past behavioral data. This feedback includes specific steps and areas for improvement to help the user achieve their goals. The server also tracks the user's behavior and provides ongoing support for improvement.
[0423] The terminal displays feedback sent from the server to the user. This allows the user to understand their progress and plan their next steps. Through this process, the system is available 24 / 7, 365 days a year, supporting the user's career growth regardless of time or location.
[0424] Thus, the present invention makes it possible to provide users with appropriate career advice with consistent quality. By introducing an AI model that supports diverse data formats, it is expected to yield results exceeding those of conventional human coaching.
[0425] The following describes the processing flow.
[0426] Step 1:
[0427] Users enter their goals and challenges in text format through the terminal interface and upload audio, image, or video files as needed.
[0428] Step 2:
[0429] The terminal sends the information entered by the user to the server. The server receives this information and prepares it for analysis.
[0430] Step 3:
[0431] The server analyzes text data using a natural language processing engine to understand the user's requests and intentions. This process extracts key keywords and context.
[0432] Step 4:
[0433] The server processes the audio data using speech recognition software, converting the user's speech into text data. This allows the information contained in the audio to be used for analysis.
[0434] Step 5:
[0435] The server uses image analysis algorithms to verify image and video data, identifying information derived from visual elements. This data is then integrated with text information for further use.
[0436] Step 6:
[0437] The server analyzes this multimodal data in an integrated manner and compares it with the user's past tracking data to generate personalized feedback and advice.
[0438] Step 7:
[0439] The server sends the generated feedback and advice to the terminal, which then displays it to the user. The user can then review this and plan their next course of action.
[0440] Step 8:
[0441] The server tracks the user's ongoing behavior and periodically updates this data to help with future feedback.
[0442] (Example 1)
[0443] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0444] Traditional coaching systems struggle to provide effective feedback tailored to each user's unique goals and challenges, and they lack the flexibility to analyze diverse data formats other than text (such as audio, images, and videos). Furthermore, they lack the functionality to continuously follow up on users' progress, resulting in insufficient ongoing career support.
[0445] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0446] In this invention, the server includes a device for receiving goals or tasks defined by the user, a device utilizing a generative artificial intelligence model for analyzing information on the goals or tasks, and a device for generating feedback and advice tailored to the user's characteristics based on the analysis. This enables the provision of customized feedback to individual users, flexibly processes diverse data formats, and provides effective career support by continuously tracking the user's progress.
[0447] A "user" is an individual who is supported in achieving goals or solving problems through this system.
[0448] A "device for receiving goals or tasks" is a device that provides an interface for users to input specific goals or tasks they have defined into the system.
[0449] A "generative artificial intelligence model" is an artificial intelligence technology equipped with an algorithm that analyzes input information and generates appropriate feedback and advice.
[0450] An "analysis device" is a device equipped with the function of processing input information such as text, audio, images, and videos, and extracting important elements.
[0451] A "feedback and advice generating device" is a device that creates personalized countermeasures and improvement suggestions in order to provide users with information tailored to their needs based on analysis results.
[0452] "Multi-format data processing" refers to the processing capability to comprehensively analyze data in different formats, such as text, audio, images, and videos, and extract information based on them.
[0453] The "progress tracking and continuous update function" refers to a system function that records user behavior and monitors progress and changes at each stage.
[0454] Modes for carrying out the invention
[0455] The following program and system configurations are possible as embodiments for carrying out this invention.
[0456] System configuration and technologies used:
[0457] The terminal functions as an interface with the user, providing an interface where the user can input their goals and challenges. This input is not limited to text format, but also includes audio, images, and videos. The terminal is equipped with communication devices for sending this data to the server.
[0458] The server plays a central role in processing the received data. The server is equipped with a generative AI model for analysis, applying natural language processing technology to text data, speech recognition technology to audio data, and image analysis technology to image and video data. This allows the server to comprehensively analyze diverse data formats and accurately grasp the user's intent.
[0459] Generating feedback and advice:
[0460] Based on the analysis results, the server generates feedback and advice tailored to the user's characteristics and past behavioral history. This generated information includes specific action steps and improvement suggestions to help the user achieve their goals.
[0461] The device helps users track their progress and plan their next steps by presenting feedback sent from the server.
[0462] Specific examples and prompt statements:
[0463] As a specific scenario, let's assume a user sets the goal of "I want to improve my speaking skills for my next presentation." In this case, the user uploads videos of past presentations, and the server analyzes that data to identify areas for improvement and provide advice on specific practice methods.
[0464] Example of a prompt:
[0465] "I'd like some advice on steps to improve my project management skills."
[0466] "Based on past meeting recordings, I want to understand how we can improve team communication."
[0467] In this way, this invention provides highly personalized career coaching tailored to the individual needs of each user, enabling more effective support than conventional methods.
[0468] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0469] Step 1:
[0470] The user uses the device's interface to input goals and challenges, and uploads audio, images, and videos as needed. This data is sent to the server by the device. During this process, the user inputs the text "I want to learn how to speak effectively in my next presentation" and a video of a past presentation. The entered data is then sent to the server.
[0471] Step 2:
[0472] The server applies natural language processing techniques to analyze the received text data. It utilizes a generative AI model to extract the user's intent from the input text and identify relevant keywords and concepts. In this step, the input is text data, and the output is analyzed semantic information. Specifically, the server extracts keywords such as "speaking style," "presentation," and "effective" from the text and identifies related topics.
[0473] Step 3:
[0474] If audio data is available, the server uses speech recognition technology to convert it into text and then performs further analysis. For image and video data, image analysis technology is used to identify important frames and content. Through these technologies, the output obtained from the input (audio, images, video) is a list or summary of the analyzed content. Specifically, the server captures important scenes from a presentation video and extracts information about gaze and posture.
[0475] Step 4:
[0476] The server integrates all the analyzed information and generates optimal feedback and advice based on the user's characteristics and past behavior history. It utilizes a generative AI model to present customized, specific steps. The input in this step is the integrated analytical information, and the output is the content of the feedback and advice. For example, the server might generate advice such as, "In your next presentation, increase eye contact and maintain a consistent speaking speed."
[0477] Step 5:
[0478] The device presents the user with feedback received from the server and notifies the user for review. The outputted feedback serves as concrete guidance for the user's actions. Specifically, the device displays the feedback content on the screen and sends a notification to the user, helping them plan their next steps.
[0479] (Application Example 1)
[0480] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0481] Business professionals often struggle to obtain helpful feedback for improving their careers and performance, without being constrained by time or location. Furthermore, traditional human coaching has limitations and struggles to fully address individual needs. Therefore, there is a need for continuous and personalized support provided in a home setting.
[0482] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0483] In this invention, the server includes means for receiving goals or tasks defined by the user, means for utilizing a generation algorithm for analyzing the data of the goals or tasks, and means for generating feedback and advice tailored to the user's characteristics based on the analysis. This enables the user to receive customized career support 24 hours a day, 365 days a year through a home device.
[0484] A "user" is an individual or group that aims to achieve their goals or objectives using this system.
[0485] "Goals or challenges" refer to specific objectives or problems that users wish to achieve or resolve, and are the subjects for which support is provided by this system.
[0486] "Household machines" refer to robots and devices that are used on a daily basis within the home and that play a role in communication and information gathering.
[0487] A "generative algorithm" is a computational method or program that analyzes received data and provides the user with the most appropriate feedback or advice.
[0488] "Integrated processing" is an information processing method that simultaneously analyzes multiple data formats and makes decisions by relating them to each other.
[0489] "Feedback and advice" refers to specific directions and instructions provided to help users achieve their goals and solve problems.
[0490] "Progress" refers to the state or degree to which a user is making progress toward achieving their goals or tasks.
[0491] This invention constructs a system in which a home appliance collects information and transmits it to a server in order to achieve goals and tasks set by the user. The home appliance used as a terminal is equipped with a voice recognition microphone and a camera, and can receive voice input and visual information from the user. The home appliance allows the user to input information about their goals and tasks with simple operation and transmits this information to the server.
[0492] The server uses the Google Speech-to-Text API to convert speech data into text data and performs natural language processing using OpenAI's generative AI model to analyze the received information. Furthermore, TensorFlow and OpenCV are used for image analysis, and a generative artificial intelligence model is employed to comprehensively analyze the received data. The server generates feedback and advice based on the user's characteristics and provides it to the user via a home device.
[0493] As a concrete example, consider a scenario where a home robot is asked by a user, "What skills should I learn for my next career step?" The user provides information to the robot by explaining their current situation and talking about areas of interest. The robot sends this information to a server, which then provides advice based on its analysis.
[0494] Examples of prompts to input into a generative AI model:
[0495] "Based on the user's stated career interests and current skill set, generate a list of technologies and skills they should acquire next."
[0496] This system will allow users to easily receive career support in the comfort of their own homes.
[0497] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0498] Step 1:
[0499] The terminal, acting as a home device, receives information about the user's goals and challenges as audio or visual information. The input information is captured by the terminal as audio or image data. The home device organizes this data and prepares it for transmission to the server.
[0500] Step 2:
[0501] The server converts the audio data received from the terminal into text data using the Google Speech-to-Text API. It analyzes the input audio data and outputs natural language text as a result. This process transforms audio information into text information.
[0502] Step 3:
[0503] The server inputs text data into a generative AI model, which then analyzes the text content using natural language processing. The generative AI model extracts the user's intent and needs from the text information and highlights relevant key points. Based on this, a basic understanding of how to achieve the user's goals is obtained.
[0504] Step 4:
[0505] The server analyzes the received image data using TensorFlow and OpenCV. By extracting visual elements from the image data and identifying related information, it analyzes the user's provided information from multiple perspectives. This process allows the server to understand the user's state and required resources from the images.
[0506] Step 5:
[0507] The server generates feedback and advice based on the analyzed text and image data. Using a generative AI model, customized feedback is provided, taking into account the user's characteristics and past performance. This output is presented as specific advice.
[0508] Step 6:
[0509] The terminal displays feedback and advice received from the server to the user. This information is displayed via voice and on the screen to help the user plan their next course of action.
[0510] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0511] This invention is an AI-based coaching system that supports the career and performance improvement of business professionals, and is characterized in that it incorporates an emotion recognition engine to provide feedback that is tailored to the user's emotional state. The system receives input from the user and generates personalized advice based on that input.
[0512] The device provides an interface for users to input their set goals and tasks as text. Users can also upload audio recordings of their own voice and videos such as video conference recordings. The device then sends this data to the server.
[0513] The server first analyzes the received data. A generative artificial intelligence model analyzes the text data using natural language processing techniques to clarify the user's goals. Simultaneously, the server activates an emotion engine to determine the user's emotional state from audio and video data. This emotion data is used to enhance the sensitivity of the feedback provided and to generate more appropriate content for the user.
[0514] As a concrete example, consider a scenario where a user sets a goal of overcoming presentation anxiety and uploads past presentation videos as relevant materials. The server uses an emotion engine to detect the user's level of tension from these videos and generates feedback that includes specific breathing techniques and mental exercises to alleviate the tension.
[0515] The generated feedback and advice are presented to the user visually and audibly via the device. This allows the user to receive continuously optimized support while adjusting their own actions in real time.
[0516] By implementing this invention, the system provides more effective and personalized business coaching that also takes into account the user's emotions. This goes beyond simply achieving goals and contributes to improving the user's ability to cope with daily tasks and pressures.
[0517] The following describes the processing flow.
[0518] Step 1:
[0519] Users input their goals and challenges in text format through the terminal interface. They can also upload audio data and video files as needed.
[0520] Step 2:
[0521] The terminal sends the data entered and uploaded by the user to the server. The server receives this information and prepares it for analysis.
[0522] Step 3:
[0523] The server uses a generative artificial intelligence model to analyze text data, understand user requests, and extract key information related to the goals.
[0524] Step 4:
[0525] The server processes uploaded audio and video data using an emotion engine, analyzing the user's emotional state from these media. For example, it recognizes the user's emotions based on changes in voice tone and facial expressions.
[0526] Step 5:
[0527] The server integrates text analysis results and sentiment recognition results to generate personalized feedback and advice based on the user's personality traits and emotional state.
[0528] Step 6:
[0529] The server sends the generated feedback and advice to the device, which then presents them to the user visually and audibly. Through the presented content, the user can follow emotionally conscious and practical advice.
[0530] Step 7:
[0531] The server frequently monitors the user's progress and emotional changes, accumulating feedback data to use in future analyses. This provides continuous learning and support.
[0532] (Example 2)
[0533] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0534] Many business professionals want to receive various forms of feedback and guidance for self-improvement and to improve work efficiency. However, conventional systems lack the flexibility to appropriately respond to users' emotional states and individual goals, making it difficult to provide personalized and effective support. This invention aims to solve these problems and enable users to achieve their goals in the most optimal way while taking their emotional state into consideration.
[0535] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0536] In this invention, the server includes a component that receives a purpose or task defined by the user, a processing unit that utilizes an artificial intelligence model to analyze information about the purpose or task, and a device that generates instructions and advice tailored to the user's characteristics based on the analysis. This allows the user to instantly receive personalized feedback and flexibly and effectively utilize that guidance through emotion recognition.
[0537] A "user" is an individual or group that uses the system to achieve their goals or accomplish their objectives.
[0538] "Objective or problem" refers to a specific outcome or solution that the user hopes to achieve with the system.
[0539] A "component" is a part of the system that receives the objectives or tasks set by the user.
[0540] "Information" is a collection of data that includes text, audio, visual information, and video.
[0541] An "artificial intelligence model" is a set of algorithms used to analyze collected information and generate instructions and advice to achieve a specific goal.
[0542] A "processing device" is a part of a system that analyzes information and generates instructions and advice tailored to the user's characteristics.
[0543] "Instructions and advice" refers to specific suggestions and guidance provided by the artificial intelligence model to help users achieve their goals.
[0544] "Device" refers to a part of a system that provides the aforementioned instructions and advice to the user and enables the user to utilize that information.
[0545] This system aims to provide advanced coaching using artificial intelligence technology to promote user growth and improve efficiency. Its main components include terminals, servers, and generative AI models.
[0546] The device provides an interface for users to input their goals and challenges. Users can directly input text information or upload audio and video data. For example, a user might set the goal of "improving their leadership skills" and upload recordings of past leadership-related meetings.
[0547] Information received from the device is sent to the server. To analyze this information, the server uses a generative AI model to analyze text data and an emotion recognition engine to detect emotional states from audio and video data. Specifically, natural language processing technology is used to analyze goals and tasks in the text, and speech analysis technology is used to identify emotional states. This generates instructions and advice related to the user's goals, and provides feedback to the user.
[0548] The generated feedback is presented to the user visually and audibly via the device. Users can receive this feedback in real time and efficiently adjust their actions.
[0549] As an example of a prompt, if you input the text "I want to know how to lead my team more efficiently on the next project" into the AI model, the system can provide effective advice and plans. This system allows users to receive support tailored to their individual needs and to respond flexibly while taking their emotional state into consideration.
[0550] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0551] Step 1:
[0552] Users input their goals and challenges through their device, and upload audio and video data as needed. For example, a user might input "I want to improve my negotiation skills" and provide recordings of past negotiations. The input data is in text and multimedia formats. This constitutes the initial input to the system.
[0553] Step 2:
[0554] The terminal sends input data received from the user to the server. The data is transferred using a secure communication protocol. The server receives text data related to the user's goals and challenges, as well as audio and video data. This prepares the raw data necessary for analysis.
[0555] Step 3:
[0556] The server inputs the received text data into a generating AI model, which then analyzes it through natural language processing. This data processing includes grammatical analysis and keyword extraction. The output provides a concrete understanding and analysis of the user's goals and challenges.
[0557] Step 4:
[0558] The server inputs audio and video data into an emotion recognition engine for analysis. For audio data, it analyzes voice tone, pitch, tempo, etc., to determine the user's emotional state. For video data, it uses facial expression recognition technology. The output is the analyzed emotion data, indicating the user's current emotional state.
[0559] Step 5:
[0560] The server integrates the text data analysis results from the generation AI model with the emotion data from the emotion recognition engine to generate user-appropriate feedback and advice. Here, the inference engine is utilized to generate individual instructions and suggestions. The output is feedback that includes specific action guidelines.
[0561] Step 6:
[0562] The terminal visually and audibly presents instructions and advice received from the server to the user. Specifically, it displays visualized data on a dashboard and provides audio guidance. This allows the user to immediately receive support from the system and take action.
[0563] (Application Example 2)
[0564] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0565] In modern society, business professionals and ordinary citizens frequently face mental stress in their daily lives and work. In this environment, there is a need to provide personalized feedback that takes individual emotional states into account. However, existing systems struggle to provide timely and optimal feedback tailored to individual emotional states, and there is a lack of mechanisms for users to receive immediate emotional support. Furthermore, proposing solutions that utilize local resources is also difficult.
[0566] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0567] In this invention, the server includes means for receiving the user's goals and challenges, means for analyzing the data using a generative information processing model, and means for recognizing the user's emotional state. This makes it possible to adaptively optimize feedback and advice tailored to the user's characteristics and emotions, taking into account local events and activities.
[0568] A "user" is an individual or group that uses this system, has goals and challenges, and requires feedback and advice regarding them.
[0569] A "goal or challenge" is a specific matter that the user aims to achieve or resolve, and it forms the basis for feedback and advice provided by this system.
[0570] A "generative information processing model" is an artificial intelligence technology that analyzes diverse data formats and generates feedback tailored to the user's goals and challenges.
[0571] "Emotion recognition means" refers to technology that determines a user's emotional state from data such as audio and images, and contributes to optimizing feedback.
[0572] "Feedback and advice" refers to the evaluations and suggestions that a generative information processing model provides to a user's goals and challenges, serving as a guide for the user to adjust their actions toward achieving those goals.
[0573] An "information terminal" is a device that provides users with feedback and advice visually or audibly, and includes smartphones and smart glasses.
[0574] "Geographic information" refers to location-related data and is used when suggesting local events and activities to users.
[0575] "Local events and activities" refer to nearby activities and gatherings that users can participate in, and are proposed to support users in reducing stress and achieving their goals.
[0576] As an embodiment of the present invention, a personalized support system for citizens in a smart city environment will be described. This system is implemented using a smartphone or smart glasses owned by the user. When the user experiences stress or challenges in their daily life, the device captures the user's voice and facial expressions.
[0577] The device sends this input data to the server, which processes it via an emotion recognition API. Specifically, it uses Microsoft Azure's emotion recognition API to determine the user's emotional state. The server then uses OpenAI GPT, a generative artificial intelligence model, to generate advice tailored to the user's current emotional state. This advice may include suggestions for relaxation methods or local events to lift their spirits.
[0578] Furthermore, by utilizing geographic information APIs and using the Google Maps API to search for and suggest local events and resources based on the user's location, we are able to provide more realistic and responsive support.
[0579] For example, if a user complains of stress at work, the smart glasses analyze the user's facial expression using an emotion recognition API and identify the emotion of anxiety. Based on this, the server uses a generative AI model to recommend "participating in a yoga event at a nearby park" and provides voice guidance for "deep breathing exercises."
[0580] This system takes a prompt example, "If the user's emotional state is anxious, suggest three relaxation methods," and the corresponding feedback is generated by an artificial intelligence model. This allows users to instantly receive effective feedback tailored to their own emotional state.
[0581] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0582] Step 1:
[0583] When a user experiences stress, the device captures their voice and facial expressions. The input consists of the user's voice data and image data from the camera, which are temporarily stored on the device. The device performs noise filtering to improve the accuracy of the data.
[0584] Step 2:
[0585] The device sends the captured audio and image data to a server in the cloud. As output, a data packet is generated. The device encrypts the data using security protocols to protect user privacy.
[0586] Step 3:
[0587] The server calls an emotion recognition API to analyze the data it receives. The input is the user's voice and image data, and the API extracts emotion information from these. The output is the identified emotional state (e.g., anxiety, joy). Based on this, the server logs the user's emotional state.
[0588] Step 4:
[0589] The server uses the emotion recognition results to input a prompt message into the generating AI model. An example prompt message used is, "If the user's emotional state is anxious, suggest three relaxation methods." The input is the identified emotional state, and the model generates feedback based on this.
[0590] Step 5:
[0591] The server integrates the generated feedback with activity and event information based on the user's geographic location, utilizing local information services. The input is the user's location data and generated feedback, and the output is a realistic suggestions for events. The server uses the Google Maps API to retrieve event information.
[0592] Step 6:
[0593] The server sends integrated feedback and event information to the terminal. The output is feedback data presented to the user. The terminal presents this information visually and audibly, allowing the user to adjust their actions based on it.
[0594] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0595] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0596] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0597] [Fourth Embodiment]
[0598] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0599] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0600] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0601] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0602] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0603] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0604] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0605] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0606] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0607] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0608] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0609] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0610] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0611] This invention is a highly personalized AI coaching system aimed at improving the careers and performance of business professionals. The system receives and analyzes user-defined goals or challenges in various data formats to generate appropriate feedback and advice.
[0612] The device provides users with an intuitive interface where they can input their goals and challenges as text and upload audio, images, and videos as needed. The device then sends this information to the server.
[0613] The server uses a generative artificial intelligence model to analyze the received information. First, the server analyzes text data using natural language processing techniques. This analysis allows for a deep understanding of the user's intent and needs, and the extraction of key points related to the topic. Furthermore, the server processes uploaded audio, image, and video data using speech recognition and image analysis technologies. This enables a multifaceted analysis of the entire material submitted by the user, providing comprehensive feedback.
[0614] Based on the analysis results, the server generates customized feedback and advice that takes into account the user's personality and past behavioral data. This feedback includes specific steps and areas for improvement to help the user achieve their goals. The server also tracks the user's behavior and provides ongoing support for improvement.
[0615] The terminal displays feedback sent from the server to the user. This allows the user to understand their progress and plan their next steps. Through this process, the system is available 24 / 7, 365 days a year, supporting the user's career growth regardless of time or location.
[0616] Thus, the present invention makes it possible to provide users with appropriate career advice with consistent quality. By introducing an AI model that supports diverse data formats, it is expected to yield results exceeding those of conventional human coaching.
[0617] The following describes the processing flow.
[0618] Step 1:
[0619] Users enter their goals and challenges in text format through the terminal interface and upload audio, image, or video files as needed.
[0620] Step 2:
[0621] The terminal sends the information entered by the user to the server. The server receives this information and prepares it for analysis.
[0622] Step 3:
[0623] The server analyzes text data using a natural language processing engine to understand the user's requests and intentions. This process extracts key keywords and context.
[0624] Step 4:
[0625] The server processes the audio data using speech recognition software, converting the user's speech into text data. This allows the information contained in the audio to be used for analysis.
[0626] Step 5:
[0627] The server uses image analysis algorithms to verify image and video data, identifying information derived from visual elements. This data is then integrated with text information for further use.
[0628] Step 6:
[0629] The server analyzes this multimodal data in an integrated manner and compares it with the user's past tracking data to generate personalized feedback and advice.
[0630] Step 7:
[0631] The server sends the generated feedback and advice to the terminal, which then displays it to the user. The user can then review this and plan their next course of action.
[0632] Step 8:
[0633] The server tracks the user's ongoing behavior and periodically updates this data to help with future feedback.
[0634] (Example 1)
[0635] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0636] Traditional coaching systems struggle to provide effective feedback tailored to each user's unique goals and challenges, and they lack the flexibility to analyze diverse data formats other than text (such as audio, images, and videos). Furthermore, they lack the functionality to continuously follow up on users' progress, resulting in insufficient ongoing career support.
[0637] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0638] In this invention, the server includes a device for receiving goals or tasks defined by the user, a device utilizing a generative artificial intelligence model for analyzing information on the goals or tasks, and a device for generating feedback and advice tailored to the user's characteristics based on the analysis. This enables the provision of customized feedback to individual users, flexibly processes diverse data formats, and provides effective career support by continuously tracking the user's progress.
[0639] A "user" is an individual who is supported in achieving goals or solving problems through this system.
[0640] A "device for receiving goals or tasks" is a device that provides an interface for users to input specific goals or tasks they have defined into the system.
[0641] A "generative artificial intelligence model" is an artificial intelligence technology equipped with an algorithm that analyzes input information and generates appropriate feedback and advice.
[0642] An "analysis device" is a device equipped with the function of processing input information such as text, audio, images, and videos, and extracting important elements.
[0643] A "feedback and advice generating device" is a device that creates personalized countermeasures and improvement suggestions in order to provide users with information tailored to their needs based on analysis results.
[0644] "Multi-format data processing" refers to the processing capability to comprehensively analyze data in different formats, such as text, audio, images, and videos, and extract information based on them.
[0645] The "progress tracking and continuous update function" refers to a system function that records user behavior and monitors progress and changes at each stage.
[0646] Modes for carrying out the invention
[0647] The following program and system configurations are possible as embodiments for carrying out this invention.
[0648] System configuration and technologies used:
[0649] The terminal functions as an interface with the user, providing an interface where the user can input their goals and challenges. This input is not limited to text format, but also includes audio, images, and videos. The terminal is equipped with communication devices for sending this data to the server.
[0650] The server plays a central role in processing the received data. The server is equipped with a generative AI model for analysis, applying natural language processing technology to text data, speech recognition technology to audio data, and image analysis technology to image and video data. This allows the server to comprehensively analyze diverse data formats and accurately grasp the user's intent.
[0651] Generating feedback and advice:
[0652] Based on the analysis results, the server generates feedback and advice tailored to the user's characteristics and past behavioral history. This generated information includes specific action steps and improvement suggestions to help the user achieve their goals.
[0653] The device helps users track their progress and plan their next steps by presenting feedback sent from the server.
[0654] Specific examples and prompt statements:
[0655] As a specific scenario, let's assume a user sets the goal of "I want to improve my speaking skills for my next presentation." In this case, the user uploads videos of past presentations, and the server analyzes that data to identify areas for improvement and provide advice on specific practice methods.
[0656] Example of a prompt:
[0657] "I'd like some advice on steps to improve my project management skills."
[0658] "Based on past meeting recordings, I want to understand how we can improve team communication."
[0659] In this way, this invention provides highly personalized career coaching tailored to the individual needs of each user, enabling more effective support than conventional methods.
[0660] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0661] Step 1:
[0662] The user uses the device's interface to input goals and challenges, and uploads audio, images, and videos as needed. This data is sent to the server by the device. During this process, the user inputs the text "I want to learn how to speak effectively in my next presentation" and a video of a past presentation. The entered data is then sent to the server.
[0663] Step 2:
[0664] The server applies natural language processing techniques to analyze the received text data. It utilizes a generative AI model to extract the user's intent from the input text and identify relevant keywords and concepts. In this step, the input is text data, and the output is analyzed semantic information. Specifically, the server extracts keywords such as "speaking style," "presentation," and "effective" from the text and identifies related topics.
[0665] Step 3:
[0666] If audio data is available, the server uses speech recognition technology to convert it into text and then performs further analysis. For image and video data, image analysis technology is used to identify important frames and content. Through these technologies, the output obtained from the input (audio, images, video) is a list or summary of the analyzed content. Specifically, the server captures important scenes from a presentation video and extracts information about gaze and posture.
[0667] Step 4:
[0668] The server integrates all the analyzed information and generates optimal feedback and advice based on the user's characteristics and past behavior history. It utilizes a generative AI model to present customized, specific steps. The input in this step is the integrated analytical information, and the output is the content of the feedback and advice. For example, the server might generate advice such as, "In your next presentation, increase eye contact and maintain a consistent speaking speed."
[0669] Step 5:
[0670] The device presents the user with feedback received from the server and notifies the user for review. The outputted feedback serves as concrete guidance for the user's actions. Specifically, the device displays the feedback content on the screen and sends a notification to the user, helping them plan their next steps.
[0671] (Application Example 1)
[0672] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0673] Business professionals often struggle to obtain helpful feedback for improving their careers and performance, without being constrained by time or location. Furthermore, traditional human coaching has limitations and struggles to fully address individual needs. Therefore, there is a need for continuous and personalized support provided in a home setting.
[0674] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0675] In this invention, the server includes means for receiving goals or tasks defined by the user, means for utilizing a generation algorithm for analyzing the data of the goals or tasks, and means for generating feedback and advice tailored to the user's characteristics based on the analysis. This enables the user to receive customized career support 24 hours a day, 365 days a year through a home device.
[0676] A "user" is an individual or group that aims to achieve their goals or objectives using this system.
[0677] "Goals or challenges" refer to specific objectives or problems that users wish to achieve or resolve, and are the subjects for which support is provided by this system.
[0678] "Household machines" refer to robots and devices that are used on a daily basis within the home and that play a role in communication and information gathering.
[0679] A "generative algorithm" is a computational method or program that analyzes received data and provides the user with the most appropriate feedback or advice.
[0680] "Integrated processing" is an information processing method that simultaneously analyzes multiple data formats and makes decisions by relating them to each other.
[0681] "Feedback and advice" refers to specific directions and instructions provided to help users achieve their goals and solve problems.
[0682] "Progress" refers to the state or degree to which a user is making progress toward achieving their goals or tasks.
[0683] This invention constructs a system in which a home appliance collects information and transmits it to a server in order to achieve goals and tasks set by the user. The home appliance used as a terminal is equipped with a voice recognition microphone and a camera, and can receive voice input and visual information from the user. The home appliance allows the user to input information about their goals and tasks with simple operation and transmits this information to the server.
[0684] The server uses the Google Speech-to-Text API to convert speech data into text data and performs natural language processing using OpenAI's generative AI model to analyze the received information. Furthermore, TensorFlow and OpenCV are used for image analysis, and a generative artificial intelligence model is employed to comprehensively analyze the received data. The server generates feedback and advice based on the user's characteristics and provides it to the user via a home device.
[0685] As a concrete example, consider a scenario where a home robot is asked by a user, "What skills should I learn for my next career step?" The user provides information to the robot by explaining their current situation and talking about areas of interest. The robot sends this information to a server, which then provides advice based on its analysis.
[0686] Examples of prompts to input into a generative AI model:
[0687] "Based on the user's stated career interests and current skill set, generate a list of technologies and skills they should acquire next."
[0688] This system will allow users to easily receive career support in the comfort of their own homes.
[0689] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0690] Step 1:
[0691] The terminal, acting as a home device, receives information about the user's goals and challenges as audio or visual information. The input information is captured by the terminal as audio or image data. The home device organizes this data and prepares it for transmission to the server.
[0692] Step 2:
[0693] The server converts the audio data received from the terminal into text data using the Google Speech-to-Text API. It analyzes the input audio data and outputs natural language text as a result. This process transforms audio information into text information.
[0694] Step 3:
[0695] The server inputs text data into a generative AI model, which then analyzes the text content using natural language processing. The generative AI model extracts the user's intent and needs from the text information and highlights relevant key points. Based on this, a basic understanding of how to achieve the user's goals is obtained.
[0696] Step 4:
[0697] The server analyzes the received image data using TensorFlow and OpenCV. By extracting visual elements from the image data and identifying related information, it analyzes the user's provided information from multiple perspectives. This process allows the server to understand the user's state and required resources from the images.
[0698] Step 5:
[0699] The server generates feedback and advice based on the analyzed text and image data. Using a generative AI model, customized feedback is provided, taking into account the user's characteristics and past performance. This output is presented as specific advice.
[0700] Step 6:
[0701] The terminal displays feedback and advice received from the server to the user. This information is displayed via voice and on the screen to help the user plan their next course of action.
[0702] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0703] This invention is an AI-based coaching system that supports the career and performance improvement of business professionals, and is characterized in that it incorporates an emotion recognition engine to provide feedback that is tailored to the user's emotional state. The system receives input from the user and generates personalized advice based on that input.
[0704] The device provides an interface for users to input their set goals and tasks as text. Users can also upload audio recordings of their own voice and videos such as video conference recordings. The device then sends this data to the server.
[0705] The server first analyzes the received data. A generative artificial intelligence model analyzes the text data using natural language processing techniques to clarify the user's goals. Simultaneously, the server activates an emotion engine to determine the user's emotional state from audio and video data. This emotion data is used to enhance the sensitivity of the feedback provided and to generate more appropriate content for the user.
[0706] As a concrete example, consider a scenario where a user sets a goal of overcoming presentation anxiety and uploads past presentation videos as relevant materials. The server uses an emotion engine to detect the user's level of tension from these videos and generates feedback that includes specific breathing techniques and mental exercises to alleviate the tension.
[0707] The generated feedback and advice are presented to the user visually and audibly via the device. This allows the user to receive continuously optimized support while adjusting their own actions in real time.
[0708] By implementing this invention, the system provides more effective and personalized business coaching that also takes into account the user's emotions. This goes beyond simply achieving goals and contributes to improving the user's ability to cope with daily tasks and pressures.
[0709] The following describes the processing flow.
[0710] Step 1:
[0711] Users input their goals and challenges in text format through the terminal interface. They can also upload audio data and video files as needed.
[0712] Step 2:
[0713] The terminal sends the data entered and uploaded by the user to the server. The server receives this information and prepares it for analysis.
[0714] Step 3:
[0715] The server uses a generative artificial intelligence model to analyze text data, understand user requests, and extract key information related to the goals.
[0716] Step 4:
[0717] The server processes uploaded audio and video data using an emotion engine, analyzing the user's emotional state from these media. For example, it recognizes the user's emotions based on changes in voice tone and facial expressions.
[0718] Step 5:
[0719] The server integrates text analysis results and sentiment recognition results to generate personalized feedback and advice based on the user's personality traits and emotional state.
[0720] Step 6:
[0721] The server sends the generated feedback and advice to the device, which then presents them to the user visually and audibly. Through the presented content, the user can follow emotionally conscious and practical advice.
[0722] Step 7:
[0723] The server frequently monitors the user's progress and emotional changes, accumulating feedback data to use in future analyses. This provides continuous learning and support.
[0724] (Example 2)
[0725] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0726] Many business professionals want to receive various forms of feedback and guidance for self-improvement and to improve work efficiency. However, conventional systems lack the flexibility to appropriately respond to users' emotional states and individual goals, making it difficult to provide personalized and effective support. This invention aims to solve these problems and enable users to achieve their goals in the most optimal way while taking their emotional state into consideration.
[0727] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0728] In this invention, the server includes a component that receives a purpose or task defined by the user, a processing unit that utilizes an artificial intelligence model to analyze information about the purpose or task, and a device that generates instructions and advice tailored to the user's characteristics based on the analysis. This allows the user to instantly receive personalized feedback and flexibly and effectively utilize that guidance through emotion recognition.
[0729] A "user" is an individual or group that uses the system to achieve their goals or accomplish their objectives.
[0730] "Objective or problem" refers to a specific outcome or solution that the user hopes to achieve with the system.
[0731] A "component" is a part of the system that receives the objectives or tasks set by the user.
[0732] "Information" is a collection of data that includes text, audio, visual information, and video.
[0733] An "artificial intelligence model" is a set of algorithms used to analyze collected information and generate instructions and advice to achieve a specific goal.
[0734] A "processing device" is a part of a system that analyzes information and generates instructions and advice tailored to the user's characteristics.
[0735] "Instructions and advice" refers to specific suggestions and guidance provided by the artificial intelligence model to help users achieve their goals.
[0736] "Device" refers to a part of a system that provides the aforementioned instructions and advice to the user and enables the user to utilize that information.
[0737] This system aims to provide advanced coaching using artificial intelligence technology to promote user growth and improve efficiency. Its main components include terminals, servers, and generative AI models.
[0738] The device provides an interface for users to input their goals and challenges. Users can directly input text information or upload audio and video data. For example, a user might set the goal of "improving their leadership skills" and upload recordings of past leadership-related meetings.
[0739] Information received from the device is sent to the server. To analyze this information, the server uses a generative AI model to analyze text data and an emotion recognition engine to detect emotional states from audio and video data. Specifically, natural language processing technology is used to analyze goals and tasks in the text, and speech analysis technology is used to identify emotional states. This generates instructions and advice related to the user's goals, and provides feedback to the user.
[0740] The generated feedback is presented to the user visually and audibly via the device. Users can receive this feedback in real time and efficiently adjust their actions.
[0741] As an example of a prompt, if you input the text "I want to know how to lead my team more efficiently on the next project" into the AI model, the system can provide effective advice and plans. This system allows users to receive support tailored to their individual needs and to respond flexibly while taking their emotional state into consideration.
[0742] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0743] Step 1:
[0744] Users input their goals and challenges through their device, and upload audio and video data as needed. For example, a user might input "I want to improve my negotiation skills" and provide recordings of past negotiations. The input data is in text and multimedia formats. This constitutes the initial input to the system.
[0745] Step 2:
[0746] The terminal sends input data received from the user to the server. The data is transferred using a secure communication protocol. The server receives text data related to the user's goals and challenges, as well as audio and video data. This prepares the raw data necessary for analysis.
[0747] Step 3:
[0748] The server inputs the received text data into a generating AI model, which then analyzes it through natural language processing. This data processing includes grammatical analysis and keyword extraction. The output provides a concrete understanding and analysis of the user's goals and challenges.
[0749] Step 4:
[0750] The server inputs audio and video data into an emotion recognition engine for analysis. For audio data, it analyzes voice tone, pitch, tempo, etc., to determine the user's emotional state. For video data, it uses facial expression recognition technology. The output is the analyzed emotion data, indicating the user's current emotional state.
[0751] Step 5:
[0752] The server integrates the text data analysis results from the generation AI model with the emotion data from the emotion recognition engine to generate user-appropriate feedback and advice. Here, the inference engine is utilized to generate individual instructions and suggestions. The output is feedback that includes specific action guidelines.
[0753] Step 6:
[0754] The terminal visually and audibly presents instructions and advice received from the server to the user. Specifically, it displays visualized data on a dashboard and provides audio guidance. This allows the user to immediately receive support from the system and take action.
[0755] (Application Example 2)
[0756] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0757] In modern society, business professionals and ordinary citizens frequently face mental stress in their daily lives and work. In this environment, there is a need to provide personalized feedback that takes individual emotional states into account. However, existing systems struggle to provide timely and optimal feedback tailored to individual emotional states, and there is a lack of mechanisms for users to receive immediate emotional support. Furthermore, proposing solutions that utilize local resources is also difficult.
[0758] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0759] In this invention, the server includes means for receiving the user's goals and challenges, means for analyzing the data using a generative information processing model, and means for recognizing the user's emotional state. This makes it possible to adaptively optimize feedback and advice tailored to the user's characteristics and emotions, taking into account local events and activities.
[0760] A "user" is an individual or group that uses this system, has goals and challenges, and requires feedback and advice regarding them.
[0761] A "goal or challenge" is a specific matter that the user aims to achieve or resolve, and it forms the basis for feedback and advice provided by this system.
[0762] A "generative information processing model" is an artificial intelligence technology that analyzes diverse data formats and generates feedback tailored to the user's goals and challenges.
[0763] "Emotion recognition means" refers to technology that determines a user's emotional state from data such as audio and images, and contributes to optimizing feedback.
[0764] "Feedback and advice" refers to the evaluations and suggestions that a generative information processing model provides to a user's goals and challenges, serving as a guide for the user to adjust their actions toward achieving those goals.
[0765] An "information terminal" is a device that provides users with feedback and advice visually or audibly, and includes smartphones and smart glasses.
[0766] "Geographic information" refers to location-related data and is used when suggesting local events and activities to users.
[0767] "Local events and activities" refer to nearby activities and gatherings that users can participate in, and are proposed to support users in reducing stress and achieving their goals.
[0768] As an embodiment of the present invention, a personalized support system for citizens in a smart city environment will be described. This system is implemented using a smartphone or smart glasses owned by the user. When the user experiences stress or challenges in their daily life, the device captures the user's voice and facial expressions.
[0769] The device sends this input data to the server, which processes it via an emotion recognition API. Specifically, it uses Microsoft Azure's emotion recognition API to determine the user's emotional state. The server then uses OpenAI GPT, a generative artificial intelligence model, to generate advice tailored to the user's current emotional state. This advice may include suggestions for relaxation methods or local events to lift their spirits.
[0770] Furthermore, by utilizing geographic information APIs and using the Google Maps API to search for and suggest local events and resources based on the user's location, we are able to provide more realistic and responsive support.
[0771] For example, if a user complains of stress at work, the smart glasses analyze the user's facial expression using an emotion recognition API and identify the emotion of anxiety. Based on this, the server uses a generative AI model to recommend "participating in a yoga event at a nearby park" and provides voice guidance for "deep breathing exercises."
[0772] This system takes a prompt example, "If the user's emotional state is anxious, suggest three relaxation methods," and the corresponding feedback is generated by an artificial intelligence model. This allows users to instantly receive effective feedback tailored to their own emotional state.
[0773] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0774] Step 1:
[0775] When a user experiences stress, the device captures their voice and facial expressions. The input consists of the user's voice data and image data from the camera, which are temporarily stored on the device. The device performs noise filtering to improve the accuracy of the data.
[0776] Step 2:
[0777] The device sends the captured audio and image data to a server in the cloud. As output, a data packet is generated. The device encrypts the data using security protocols to protect user privacy.
[0778] Step 3:
[0779] The server calls an emotion recognition API to analyze the data it receives. The input is the user's voice and image data, and the API extracts emotion information from these. The output is the identified emotional state (e.g., anxiety, joy). Based on this, the server logs the user's emotional state.
[0780] Step 4:
[0781] The server uses the emotion recognition results to input a prompt message into the generating AI model. An example prompt message used is, "If the user's emotional state is anxious, suggest three relaxation methods." The input is the identified emotional state, and the model generates feedback based on this.
[0782] Step 5:
[0783] The server integrates the generated feedback with activity and event information based on the user's geographic location, utilizing local information services. The input is the user's location data and generated feedback, and the output is a realistic suggestions for events. The server uses the Google Maps API to retrieve event information.
[0784] Step 6:
[0785] The server sends integrated feedback and event information to the terminal. The output is feedback data presented to the user. The terminal presents this information visually and audibly, allowing the user to adjust their actions based on it.
[0786] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0787] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0788] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0789] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0790] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0791] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0792] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0793] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0794] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0795] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0796] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0797] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0798] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0799] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0800] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0801] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0802] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0803] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0804] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0805] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0806] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0807] The following is further disclosed regarding the embodiments described above.
[0808] (Claim 1)
[0809] A means of receiving goals or tasks set by the user,
[0810] A means utilizing a generative artificial intelligence model to analyze the data of the aforementioned goals or challenges,
[0811] Based on the aforementioned analysis, means for generating feedback and advice tailored to the user's characteristics,
[0812] Means for providing the aforementioned feedback and advice to the user,
[0813] A system that includes this.
[0814] (Claim 2)
[0815] The system according to claim 1, wherein the generating artificial intelligence model performs multimodal processing when the data is text, audio, images, or video.
[0816] (Claim 3)
[0817] The system according to claim 1, further comprising means for tracking and continuously updating the progress of the user.
[0818] "Example 1"
[0819] (Claim 1)
[0820] A device that receives goals or tasks set by the user,
[0821] A device that utilizes a generative artificial intelligence model to analyze information on the aforementioned goals or challenges,
[0822] Based on the aforementioned analysis, a device that generates feedback and advice tailored to the user's characteristics,
[0823] The device that provides the aforementioned feedback and advice,
[0824] A device that generates customized feedback that takes into account the user's behavioral history based on the analyzed information,
[0825] An information processing system that includes this.
[0826] (Claim 2)
[0827] The information processing system according to claim 1, wherein the generating artificial intelligence model performs various data processing when the information is text, audio, images, or video.
[0828] (Claim 3)
[0829] The information processing system according to claim 1, further comprising a function for tracking and continuously updating the progress of the user.
[0830] "Application Example 1"
[0831] (Claim 1)
[0832] A means of receiving goals or tasks set by the user,
[0833] A means utilizing a generation algorithm to analyze the data for the aforementioned objectives or challenges,
[0834] Based on the above analysis, means for generating feedback and advice tailored to the user's characteristics,
[0835] Means for providing the aforementioned feedback and advice to the user,
[0836] A means by which a home appliance collects and transmits information from the user,
[0837] A system that includes this.
[0838] (Claim 2)
[0839] The system according to claim 1, wherein the generation algorithm performs integrated processing when the data is text information, audio information, visual information, or video information.
[0840] (Claim 3)
[0841] The system according to claim 1, further comprising means for tracking, continuously updating, and presenting to the user the user's progress using a household device.
[0842] "Example 2 of combining an emotion engine"
[0843] (Claim 1)
[0844] Components that receive the purpose or objective defined by the user,
[0845] A processing device that utilizes an artificial intelligence model for analyzing information on the aforementioned objectives or problems,
[0846] Based on the above analysis, a device that generates instructions and advice tailored to the user's characteristics,
[0847] A device that processes audio or visual information provided by the user and recognizes emotions,
[0848] A device that provides the user with the aforementioned instructions and advice visually and audibly,
[0849] A system that includes this.
[0850] (Claim 2)
[0851] The system according to claim 1, wherein the artificial intelligence model processes various forms when the information is text, sound, visual information, or video.
[0852] (Claim 3)
[0853] The system according to claim 1, further comprising a device for monitoring and continuously updating the progress of the user.
[0854] "Application example 2 when combining with an emotional engine"
[0855] (Claim 1)
[0856] A means of receiving goals or tasks set by the user,
[0857] A means utilizing a generative information processing model to analyze the data of the aforementioned objective or task,
[0858] Based on the above analysis, means for generating feedback and advice tailored to the user's characteristics,
[0859] A means of determining the user's emotional state using emotion recognition means,
[0860] A means for adaptively optimizing feedback and advice based on the user's emotional state,
[0861] An information terminal that provides the aforementioned feedback and advice to the user,
[0862] A means of using geographic information to suggest local events and activities to users,
[0863] A system that includes this.
[0864] (Claim 2)
[0865] The system according to claim 1, wherein the generated information processing model performs processing in various forms when the data is text, audio, images, or video.
[0866] (Claim 3)
[0867] The system according to claim 1, further comprising a function for tracking and continuously updating the progress of the user. [Explanation of Symbols]
[0868] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A means of receiving goals or tasks set by the user, A means utilizing a generation algorithm to analyze the data for the aforementioned objectives or challenges, Based on the above analysis, means for generating feedback and advice tailored to the user's characteristics, Means for providing the aforementioned feedback and advice to the user, A means by which a home appliance collects and transmits information from the user, A system that includes this.
2. The system according to claim 1, wherein the generation algorithm performs integrated processing when the data is text information, audio information, visual information, or video information.
3. The system according to claim 1, further comprising means for tracking, continuously updating, and presenting to the user the user's progress using a household device.