system
The system addresses the challenge of providing personalized learning support by using real-time facial and voice data analysis to generate tailored explanations and problems, enhancing motivation and understanding.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
Conventional learning support systems fail to provide personalized guidance tailored to individual learners' understanding levels and emotional states, leading to decreased motivation and stagnation in academic performance.
A system that acquires learner's facial expression and voice data in real-time, analyzes emotional states, and generates personalized explanations and supplementary problems using machine learning algorithms, providing immediate feedback through audio and visual means.
Enhances learning motivation and understanding by offering personalized support that addresses learners' challenges in real-time, optimizing learning experiences.
Smart Images

Figure 2026101435000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] Conventional learning support systems generally provide uniform guidance content and have difficulty flexibly responding according to the individual understanding levels and emotional states of learners. As a result, when learners face problems, they may not receive appropriate support, which may lead to a decline in learning motivation and stagnation in academic performance. An object of this invention is to improve learning motivation and deepen understanding by grasping the understanding level and emotional state of learners in real time and providing individualized guidance.
Means for Solving the Problems
[0005] This invention includes an acquisition means for acquiring learner's facial expression data and voice data in real time. This allows for rapid understanding of the learner's emotional state, and an analysis means is provided. The analysis means identifies emotional states related to the learner's level of understanding based on the acquired data. Furthermore, a generation means is provided to generate personalized explanations and supplementary problems based on the analysis results. The generated explanations and problems are presented to the learner by a presentation means, which provides real-time feedback. This enables immediate resolution of learner challenges and the provision of personalized support.
[0006] "Acquisition means" refers to a device or mechanism for acquiring learner's facial expression data and voice data in real time.
[0007] "Analysis means" refers to a device or algorithm for analyzing and identifying emotional states related to a learner's level of understanding, based on data obtained by acquisition means.
[0008] "Generating means" refers to a device or mechanism that generates individualized explanations or auxiliary problems based on information obtained by analytical means.
[0009] "Presentation means" refers to a device or mechanism that presents explanations and problems generated by the generation means to learners visually or aurally.
[0010] A "learning plan" is an optimized learning framework or schedule based on a learner's past learning history and level of understanding.
[0011] A "machine learning algorithm" is a data-driven computational method that analyzes large amounts of data to identify learner characteristics and levels of understanding. [Brief explanation of the drawing]
[0012] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]
[0013] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.
[0014] First, the terms used in the following description will be explained.
[0015] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0016] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0017] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0018] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), etc.
[0019] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0020] [First Embodiment]
[0021] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0022] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0023] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0024] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0025] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0026] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0027] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0028] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0029] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0030] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0031] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0032] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0033] This invention consists of multiple components to provide flexible learning support tailored to the individual needs of learners. The system can efficiently acquire facial expression and voice data, perform data analysis, generate customized content, and present the content.
[0034] The device operates continuously during the learner's learning session, capturing the learner's face with a camera and their voice with a microphone. This captured data is transmitted to the server via wireless or wired communication.
[0035] The server uses machine learning algorithms to analyze received facial expression and voice data to evaluate the learner's emotional state. The server retrieves the learner's learning history from a database and combines the current state with past data to determine areas of difficulty and the level of understanding.
[0036] The server uses generation methods to create personalized explanations and supplementary problems for learners. This generation is performed in real time using AI technology, with the aim of providing learners with the most effective content.
[0037] The device presents learners with personalized explanations and supplementary problems received from the server. This is done through audio guidance via a speaker and visual presentations via a tablet or other display. This allows learners to immediately understand their difficulties and receive clear guidance for tackling the problems.
[0038] As a concrete example, consider a case where an elementary school student is solving a math problem. When the user gets stuck on a calculation problem, the device detects the change in their facial expression using a sensor and immediately sends the data to the server. The server analyzes the user's past learning data and current facial expression data to understand why the user is having difficulty with that problem. Based on the results, the generation system creates appropriate explanations and supplementary problems, which the device then presents to the user. Through this process, the user can deepen their understanding of the problem and continue to learn effectively.
[0039] The following describes the processing flow.
[0040] Step 1:
[0041] The device powers on, verifies the learner's login, and prepares to begin learning. Once the user logs in, the device's camera and microphone activate, starting to capture the learner's facial expressions and voice in real time.
[0042] Step 2:
[0043] The device sends acquired data to the server at regular intervals. The data includes image frames and audio clips, representing the learner's current state.
[0044] Step 3:
[0045] The server inputs the received data into a machine learning algorithm for analysis. The server then uses this to assess the learner's emotional state (e.g., confusion, interest) and determine if there are any stumbling blocks.
[0046] Step 4:
[0047] The server matches the learner's facial expression data with their past learning history and extracts content from the database that is relevant to their current difficulty.
[0048] Step 5:
[0049] The server generates personalized explanations and supplementary problems. This generation is performed using an AI engine to provide learners with optimal learning support.
[0050] Step 6:
[0051] The terminal presents the learner with explanations and supplementary problems received from the server. The terminal supports the learner by displaying the problems on its screen and providing explanations through audio guidance.
[0052] Step 7:
[0053] Users answer the presented questions and send their feedback back to the server via their device. Once the server receives the user's answers, they are incorporated into their future learning plan.
[0054] Step 8:
[0055] The server re-evaluates the learner's progress based on their responses and answer data, and prepares to adjust the content of the next learning session based on the results.
[0056] (Example 1)
[0057] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0058] Traditional education systems have struggled to grasp each learner's level of understanding and emotional state in real time and provide appropriate learning content immediately based on that information. Therefore, they are unable to provide flexible instruction tailored to each learner's progress, resulting in challenges in providing efficient learning support.
[0059] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0060] In this invention, the server includes means for instantly collecting learner's expression and acoustic information, means for evaluating the collected information to identify emotional states related to comprehension, and means for generating personalized explanations and supplementary problems. This makes it possible to provide learning content that is instantly optimized according to the learner's real-time emotional state and learning history.
[0061] A "learner" is someone who acquires new knowledge and skills in an educational activity.
[0062] "Expressive information" refers to visually observable information such as a learner's facial expressions and attitude, and is data used to judge their emotional state.
[0063] "Acoustic information" refers to auditory observable information, such as the voice and speech patterns emitted by learners, and is used as data to judge emotions and comprehension.
[0064] "Immediate" refers to data collection and processing being performed in real time without delay.
[0065] "Assessment" refers to the process of analyzing collected data to identify learners' levels of understanding and emotional states.
[0066] "Emotional state" refers to the learner's emotional state and reactions, which can be used to assess their motivation and level of concentration.
[0067] "Individualized explanations" refer to learning materials and instructional content that are tailored to the learner's characteristics and current level of understanding.
[0068] "Supplementary problems" refer to additional exercises or assignments provided to reinforce learners' understanding.
[0069] "Artificial intelligence technology" refers to technologies that use machine learning algorithms and data analysis methods to autonomously solve problems or perform tasks.
[0070] This invention is a system designed to provide dynamic learning support to learners, and it mainly consists of a server and terminals.
[0071] The device collects representational and acoustic information using a camera and microphone during the learner's learning process. For example, a standard webcam and microphone can be used. The collected data is transmitted to a server via Wi-Fi or wired communication.
[0072] The server uses Python to leverage machine learning libraries such as TENSORFLOW® and PyTorch to evaluate the received data. This allows it to assess the learner's emotional state, compare it with their learning history retrieved from a database, and generate personalized explanations and supplementary problems.
[0073] The generation process utilizes a generative AI model, allowing learners to receive appropriate content based on prompt input. A concrete example of a prompt is, "This user lacks basic fraction calculation skills. Please suggest a step-by-step guide and practice exercises."
[0074] The device presents the generated personalized content to the learner. A standard speaker is used for audio guidance, while a tablet device is used for visual explanations. This allows learners to study at their own pace and receive support to deepen their understanding.
[0075] As a concrete example, consider a case where an elementary school student is learning arithmetic. When the user encounters a problem and finds it difficult to understand, the device detects a change in facial expression and sends that data to a server. The server generates appropriate explanations and additional practice problems tailored to the situation and provides them to the user through the device. In this way, the user can receive immediate and effective instruction.
[0076] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0077] Step 1:
[0078] The device activates its camera and microphone during the learner's learning session to collect visual and auditory information. This information includes facial expressions and voice tone. The input is the learner's visual and auditory data, which is formatted as a digital signal. The output is data packets ready for transmission to the server. The device sends the collected data to the server immediately via Wi-Fi or a wired connection.
[0079] Step 2:
[0080] The server receives expression and acoustic information transmitted from the terminal. The input is data packets, which are converted into a format for analysis. The server uses machine learning libraries to evaluate the data and determine the learner's emotional state and level of concentration. The output is an analysis result showing the learner's current state.
[0081] Step 3:
[0082] The server retrieves the learner's past learning history from the database based on the analysis results. The input consists of the analyzed emotional state and past learning history. By combining this data, the server evaluates the learner's level of understanding and identifies areas where they are struggling. The output is the evaluation result based on their level of understanding.
[0083] Step 4:
[0084] The server uses a generative AI model to generate personalized explanations and supplementary problems. The input here is the evaluation result obtained in step 3, which is input to the generative AI model as a prompt. A prompt such as "This user lacks basic fraction calculation skills. Please suggest a step-by-step guide and practice problems." is used. The output is explanations and problems optimized for the learner.
[0085] Step 5:
[0086] The terminal presents learners with personalized explanations and supplementary questions received from the server through audio guidance and display. Input is generated content data from the server, which is converted into a user-friendly format. Output is information presented to learners—that is, learning support content presented visually and aurally. This allows users to receive real-time guidance and continue their learning.
[0087] (Application Example 1)
[0088] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0089] In modern education, it is difficult to provide learning experiences optimized for the different levels of understanding and progress of individual learners. Furthermore, the lack of effective support at home necessitates constant supervision by parents and educators. In this context, there is a need for support technologies that enable learners to autonomously, effectively, and individually engage in learning.
[0090] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0091] In this invention, the server includes means for acquiring visual and auditory information of learners in real time, means for analyzing the data acquired by the above functions and identifying mental states related to the learner's level of recognition, and means for generating personalized instruction and practice problems based on the mental states identified by the above functions. This enables the machine to become a learning supporter at home, allowing for effective instruction and provision of materials tailored to the individual learning situation.
[0092] "Visual information" refers to video data used to capture the movements and facial expressions of a subject.
[0093] "Auditory information" refers to audio data used to capture the speech or sounds of a subject.
[0094] "Acquiring in real time" means collecting the target information immediately on the spot.
[0095] "Awareness level" is an indicator that shows how well learners understand a particular subject or issue.
[0096] "Mental state" refers to the psychological and emotional state of a learner.
[0097] "Individualized instruction" means providing individualized educational guidance tailored to each learner's needs and level of understanding.
[0098] "Practice problems" are work assignments provided to learners to solve problems using the knowledge they have acquired.
[0099] "Machines acting as learning supporters in the home" refers to automated devices providing educational guidance in the home.
[0100] "Providing materials" means supplying learners with the information and learning materials they need for their studies.
[0101] The system implementing this invention includes a comprehensive educational support device for providing learning support within the home. The terminal is installed in the home with the learner and uses a built-in camera and microphone to acquire the learner's visual and auditory information in real time. This makes it possible to continuously observe the learner's emotional state and level of comprehension during learning.
[0102] The server receives data sent from the terminal and performs analysis. The server uses analysis algorithms implemented using programming languages such as Python. It utilizes visual information analysis libraries such as OpenCV and face recognition functions such as dlib to identify the learner's mental state. Furthermore, it refers to past learning history and deepens the analysis by combining it with the current mental state.
[0103] The generative AI model operates on the server, generating personalized instructional content and practice problems tailored to the learner's state. This process is implemented using the API of OpenAI®, a well-known generative AI model, to address the learner's fluctuating needs in real time. This is where the importance of the "generative AI model and prompt statements" becomes apparent.
[0104] The device serves to present the generated content to the learner, providing audio guidance through the speaker and visual presentations via the display. For example, if a learner gets stuck on a particular math problem, the system will display several similar problems and provide guidance such as, "Try to remember the answer from last time."
[0105] As a concrete example, here is an example of a prompt: "The learner is struggling with multiplication problems. Based on past learning data, create advice and supplementary problems to re-present problems of a similar difficulty level." This allows learners to progress autonomously and with individualized support at home.
[0106] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0107] Step 1:
[0108] The device acquires the learner's visual and auditory information in real time using a camera and microphone. The input consists of the learner's facial image data and audio data, which are transmitted to the server via wireless communication.
[0109] Step 2:
[0110] The server analyzes the received visual and auditory information. The input consists of facial image data and audio data sent from the terminal. Using OpenCV and the dlib library, it analyzes facial expressions and identifies the learner's mental state. This calculation yields the learner's emotional state as output.
[0111] Step 3:
[0112] The server combines identified emotional states with the learner's past learning history data and performs analysis. The input consists of mental states and past learning data; based on this, it identifies learning stumbling blocks and adjusts the learning plan in real time to meet demand. The output is personalized educational needs.
[0113] Step 4:
[0114] The server uses a generative AI model to generate personalized instructional content and practice problems optimized for the learner. The input consists of data stored on the server and prompts generated using the AI model; an example of a prompt is provided using "Generative AI Model, Prompt". New learning content is obtained as output.
[0115] Step 5:
[0116] The device presents the generated instructional content and practice problems to the learner via audio and display. The input is the generated learning content, and the device performs specific actions such as audio guidance via the speaker and visual presentation on the display. Based on this presentation, the learner can proceed with their learning autonomously.
[0117] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0118] This invention provides a learning support system that integrates an emotion engine that accurately identifies the emotional state of learners, thereby offering a learning experience tailored to individual needs. The system consists of a terminal, a server, and the emotion engine, which work together to provide optimal support to learners.
[0119] The terminal is a device used by learners and is equipped with a camera and microphone. This terminal captures the learner's facial expressions and voice in real time and transmits them to the server as digital data.
[0120] The server receives data sent from the terminal and uses an emotion engine to analyze the user's emotional state. The emotion engine incorporates algorithms to recognize different emotional states, which are then used to evaluate the learner's level of understanding and concentration.
[0121] Furthermore, the server references the learner's accumulated past learning history to understand their progress and generates customized explanations and supplementary questions that combine this with their analyzed emotional state. This generated content is best suited to the learner's current situation.
[0122] The device presents content received from the server to the learner. This presentation utilizes both audio guidance and screen displays to convey the information necessary to resolve the difficulties the learner faces.
[0123] As a concrete example, consider a middle school student taking an online history lesson. When the user begins to show signs of confusion due to difficult dates or events, the device detects this change and sends data to the server. The server uses an emotion engine to analyze the degree of confusion and generates visual support and easy-to-remember related questions based on past learning history to aid understanding. As a result, the user gains clues to understand complex dates in chronological order and can re-engage in learning with renewed motivation.
[0124] The following describes the processing flow.
[0125] Step 1:
[0126] The device activates its camera and microphone as soon as the learner starts a lesson, capturing facial and audio data in real time. This information is then converted into a format that can be analyzed by the emotion engine.
[0127] Step 2:
[0128] The device sends the captured data to the server. A communication protocol that enables secure and efficient data transfer is used, and the process takes place in the background without interrupting the learning environment.
[0129] Step 3:
[0130] The server inputs the received data into the emotion engine. The emotion engine uses a machine learning model to analyze the learner's facial expressions and voice characteristics to determine their emotional state with high accuracy.
[0131] Step 4:
[0132] The server compares the analysis results with the learner's past learning history to evaluate their level of understanding and concentration. This identifies the learning difficulties and areas where the learner is currently struggling.
[0133] Step 5:
[0134] The server uses generation tools to create personalized explanations and supplementary questions for learners. This includes motivational feedback tailored to their emotional state and engaging content.
[0135] Step 6:
[0136] The device provides learners with content received from the server, both audibly and visually. The audio guide explains in an easy-to-understand intonation, and displays relevant information and questions on a tablet or other display.
[0137] Step 7:
[0138] Users deepen their understanding by working through the presented problems. The solutions are sent to the server in real time via the device and programmed to be reflected in the next learning step.
[0139] Step 8:
[0140] The server stores user response data and reactions to help plan future learning. This data is then used for system-wide re-evaluation and improvement, providing a more optimized experience for learners.
[0141] (Example 2)
[0142] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0143] Conventional learning support systems lack the ability to provide nuanced support based on learners' emotions and learning history, which hinders their ability to maximize learning efficiency. Furthermore, the lack of real-time content delivery that adapts to individual learners' emotional changes results in insufficient efforts to maintain learners' motivation and improve their comprehension.
[0144] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0145] In this invention, the server includes information acquisition means for acquiring learner facial expression information and voice information in real time, emotion analysis means for analyzing the acquired information and identifying the learner's emotional state, and content generation means for generating personalized explanations and questions based on the identified emotional state and the learner's past learning history. This makes it possible to provide optimal educational content that is tailored to the learner's emotions and learning history.
[0146] "Information acquisition means" refers to a device or system that has the function of collecting learner's facial expression information and voice information in real time.
[0147] "Emotional analysis means" refers to a device or system that processes acquired facial expression information and voice information of a learner and has the function of identifying their emotional state.
[0148] "Content generation means" refers to a device or system that has the function of creating personalized explanations and questions based on the identified learner's emotional state and past learning history.
[0149] "Content presentation means" refers to a device or system that has the function of presenting generated explanations and questions to learners.
[0150] An "educational support system" is a system that includes multiple means designed to improve learners' learning efficiency.
[0151] A "generative AI model" is a model that uses machine learning algorithms to dynamically generate personalized explanations and problems in real time.
[0152] This invention aims to provide an educational experience that meets the diverse needs of learners in a learning support system. The invention is implemented in a system centered on a terminal, a server, and an emotion engine composed of these components.
[0153] The terminal is a device used directly by learners and is equipped with a camera and microphone. This allows it to capture real-time facial expression information and audio from the learners and process it as digital data. This processing utilizes a dedicated application within the device, which adds capabilities for facial recognition and audio recording.
[0154] The server receives digital data sent from the terminal and uses an emotion engine to analyze the learner's emotional state. The emotion engine uses machine learning algorithms to identify a variety of emotions. Based on this data, the analysis results are input into a generation AI model, which then generates customized content based on prompts. Specifically, by integrating this with the learner's past learning history, it creates explanations and supplementary problems optimized for their learning situation.
[0155] For example, if a middle school student becomes confused by a difficult date during an online history lesson, the device quickly captures their facial expression and sends it to the server. The server analyzes this confusion using an emotion engine and instructs a generative AI model with a prompt message such as "Generate a concise chart to help understand the dates chronologically." The resulting content is then presented to the user via the device with audio guidance. This allows learners to regain interest in learning and engage with the material more actively.
[0156] This system allows learners to gain a deeper understanding and more active learning at their own pace, and therefore its educational effectiveness is considered to be very high.
[0157] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0158] Step 1:
[0159] The device acquires the learner's facial expression and voice information in real time using a camera and microphone. This input is processed by a dedicated application within the device and converted into digital data format. This processing prepares the data for transmission to the server in an appropriate format for analysis.
[0160] Step 2:
[0161] The terminal sends the processed digital data to the server. The digitized facial expression and voice information, as input, are sent to the server based on a secure communication protocol. Once the server has received the output, it can proceed to the next analysis stage.
[0162] Step 3:
[0163] The server receives data transmitted from the terminal and identifies the learner's emotional state using emotion analysis tools. The input for this step is the facial and voice data received in the previous step. The emotion engine analyzes this data and outputs the identified emotion (e.g., joy, confusion, concentration). This information is used to adapt the learner's learning experience.
[0164] Step 4:
[0165] The server integrates the results of sentiment analysis with past learning history and uses a generative AI model to generate customized learning content. The input for this step is sentiment state and learning history data. The generative AI model uses prompt sentences to generate appropriate explanations and supplementary questions, which are then stored on the server as output.
[0166] Step 5:
[0167] The server sends the generated content to the terminal. The input is the generated learning content, and the output is the completion of the transmission to the terminal. This prepares the learner to receive content optimized for them.
[0168] Step 6:
[0169] The terminal presents content received from the server to the learner. Specifically, content is presented using screen displays and audio guides. The content data, as input, is converted into visual and auditory information, facilitating the learner's understanding as output. This presentation allows learners to actively engage in learning, improving the educational experience.
[0170] (Application Example 2)
[0171] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0172] In today's educational environment, appropriate instruction and support tailored to the individual needs of learners are required. Especially during home learning, it is crucial not only to maintain concentration but also to provide learning content that is appropriate and tailored to each student's level of understanding in real time. However, conventional systems have struggled to accurately grasp learners' emotional states and levels of understanding and to implement optimized instruction based on that information. Improvements are needed to address this challenge.
[0173] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0174] In this invention, the server includes means for acquiring learner perceptual data in real time, means for analyzing the information and identifying emotional states related to the subject's level of understanding, means for generating individualized instruction and support tasks based on the identified emotional states, and a social machine device for supporting children's learning at home. This enables effective learning support tailored to the learner's level of understanding and concentration.
[0175] "Perceptual data" refers to information used to evaluate a learner's emotional state and level of comprehension, including their facial expressions and voice.
[0176] "Analysis means" refers to a device or system that analyzes perceptual data and performs processing to identify the learner's emotional state and level of understanding.
[0177] A "generation means" is a device or system that has the function of creating content and assignments necessary to provide optimal guidance and support to learners based on the analysis results.
[0178] "Presentation means" refers to a device or system that visually or aurally communicates instructional content or assignments created by the generation means to learners.
[0179] "Social mechanized devices" are robots and devices installed to support learning within the home, providing appropriate education through direct interaction with learners.
[0180] This embodiment of the invention operates as a system that provides appropriate educational support based on learner perceptual data. The server receives and analyzes real-time learner facial expression data and voice data transmitted from the terminal.
[0181] This includes hardware such as a Raspberry Pi, camera module, and microphone for processing perceptual data, and software such as an image processing library (OpenCV) and a machine learning library for speech analysis (TensorFlow).
[0182] The server uses analytical tools to identify the learner's emotional state and generates instructional content tailored to their level of understanding. The generated instructional content utilizes a generative AI model. For example, if a learner faces a difficult task and shows signs of distress, the server identifies that emotion and generates appropriate supplementary tasks or visual support, sending them to the device.
[0183] The device presents audio and visually generated learning content and provides learners with immediate feedback. This process is facilitated by a social machine device that supports children's learning within the home.
[0184] As a concrete example of its use, if a third-grade elementary school student is practicing kanji in Japanese class at home and the system detects that the child is hesitating over a difficult reading, it might prompt the child with a question like, "How about taking a nap?" and provide visual assistance to keep the child engaged. Furthermore, the system's generative AI model can use prompts such as, "Imagine a robot that optimally supports your emotions. What kind of learning content would further motivate you to learn?"
[0185] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0186] Step 1:
[0187] The device uses a camera and microphone to acquire learner facial and voice data in real time. This data serves as input for analyzing factors that influence performance and attention. The device then prepares to send the data to a server.
[0188] Step 2:
[0189] The server receives facial expression data and audio data sent from the terminal. Based on the received data, it performs facial expression analysis using OpenCV and audio analysis using TensorFlow. Through this analysis, it identifies the learner's emotional state and outputs information related to their level of comprehension and concentration.
[0190] Step 3:
[0191] Based on the analyzed emotional state and level of understanding, the server uses a generative AI model to generate instructional content and supplementary tasks optimized for the learner. In this process, the generative AI model is input with the prompt "Imagine a robot that optimally supports your emotions. What kind of learning content would further motivate you to learn?", and its output is used as instructional content.
[0192] Step 4:
[0193] The server sends the generated instructional content and supplementary assignments to the terminal. The terminal receives this and presents it to the learner as audio guidance and visual content. At this stage, specific instruction is given to the user, and the system's feedback loop is formed as their reactions are observed.
[0194] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0195] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0196] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0197] [Second Embodiment]
[0198] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0199] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0200] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0201] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0202] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0203] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0204] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0205] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0206] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0207] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0208] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0209] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0210] This invention consists of multiple components to provide flexible learning support tailored to the individual needs of learners. The system can efficiently acquire facial expression and voice data, perform data analysis, generate customized content, and present the content.
[0211] The device operates continuously during the learner's learning session, capturing the learner's face with a camera and their voice with a microphone. This captured data is transmitted to the server via wireless or wired communication.
[0212] The server uses machine learning algorithms to analyze received facial expression and voice data to evaluate the learner's emotional state. The server retrieves the learner's learning history from a database and combines the current state with past data to determine areas of difficulty and the level of understanding.
[0213] The server uses generation methods to create personalized explanations and supplementary problems for learners. This generation is performed in real time using AI technology, with the aim of providing learners with the most effective content.
[0214] The device presents learners with personalized explanations and supplementary problems received from the server. This is done through audio guidance via a speaker and visual presentations via a tablet or other display. This allows learners to immediately understand their difficulties and receive clear guidance for tackling the problems.
[0215] As a concrete example, consider a case where an elementary school student is solving a math problem. When the user gets stuck on a calculation problem, the device detects the change in their facial expression using a sensor and immediately sends the data to the server. The server analyzes the user's past learning data and current facial expression data to understand why the user is having difficulty with that problem. Based on the results, the generation system creates appropriate explanations and supplementary problems, which the device then presents to the user. Through this process, the user can deepen their understanding of the problem and continue to learn effectively.
[0216] The following describes the processing flow.
[0217] Step 1:
[0218] The device powers on, verifies the learner's login, and prepares to begin learning. Once the user logs in, the device's camera and microphone activate, starting to capture the learner's facial expressions and voice in real time.
[0219] Step 2:
[0220] The device sends acquired data to the server at regular intervals. The data includes image frames and audio clips, representing the learner's current state.
[0221] Step 3:
[0222] The server inputs the received data into a machine learning algorithm for analysis. The server then uses this to assess the learner's emotional state (e.g., confusion, interest) and determine if there are any stumbling blocks.
[0223] Step 4:
[0224] The server matches the learner's facial expression data with their past learning history and extracts content from the database that is relevant to their current difficulty.
[0225] Step 5:
[0226] The server generates personalized explanations and supplementary problems. This generation is performed using an AI engine to provide learners with optimal learning support.
[0227] Step 6:
[0228] The terminal presents the learner with explanations and supplementary problems received from the server. The terminal supports the learner by displaying the problems on its screen and providing explanations through audio guidance.
[0229] Step 7:
[0230] Users answer the presented questions and send their feedback back to the server via their device. Once the server receives the user's answers, they are incorporated into their future learning plan.
[0231] Step 8:
[0232] The server re-evaluates the learner's progress based on their responses and answer data, and prepares to adjust the content of the next learning session based on the results.
[0233] (Example 1)
[0234] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0235] Traditional education systems have struggled to grasp each learner's level of understanding and emotional state in real time and provide appropriate learning content immediately based on that information. Therefore, they are unable to provide flexible instruction tailored to each learner's progress, resulting in challenges in providing efficient learning support.
[0236] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0237] In this invention, the server includes means for instantly collecting learner's expression and acoustic information, means for evaluating the collected information to identify emotional states related to comprehension, and means for generating personalized explanations and supplementary problems. This makes it possible to provide learning content that is instantly optimized according to the learner's real-time emotional state and learning history.
[0238] A "learner" is someone who acquires new knowledge and skills in an educational activity.
[0239] "Expressive information" refers to visually observable information such as a learner's facial expressions and attitude, and is data used to judge their emotional state.
[0240] "Acoustic information" refers to auditory observable information, such as the voice and speech patterns emitted by learners, and is used as data to judge emotions and comprehension.
[0241] "Immediate" refers to data collection and processing being performed in real time without delay.
[0242] "Assessment" refers to the process of analyzing collected data to identify learners' levels of understanding and emotional states.
[0243] "Emotional state" refers to the learner's emotional state and reactions, which can be used to assess their motivation and level of concentration.
[0244] "Individualized explanations" refer to learning materials and instructional content that are tailored to the learner's characteristics and current level of understanding.
[0245] "Supplementary problems" refer to additional exercises or assignments provided to reinforce learners' understanding.
[0246] "Artificial intelligence technology" refers to technologies that use machine learning algorithms and data analysis methods to autonomously solve problems or perform tasks.
[0247] This invention is a system designed to provide dynamic learning support to learners, and it mainly consists of a server and terminals.
[0248] The device collects representational and acoustic information using a camera and microphone during the learner's learning process. For example, a standard webcam and microphone can be used. The collected data is transmitted to a server via Wi-Fi or wired communication.
[0249] The server uses Python to leverage machine learning libraries such as TensorFlow and PyTorch to evaluate the received data. This allows it to assess the learner's emotional state, compare it with their learning history retrieved from a database, and generate personalized explanations and supplementary problems.
[0250] The generation process utilizes a generative AI model, allowing learners to receive appropriate content based on prompt input. A concrete example of a prompt is, "This user lacks basic fraction calculation skills. Please suggest a step-by-step guide and practice exercises."
[0251] The device presents the generated personalized content to the learner. A standard speaker is used for audio guidance, while a tablet device is used for visual explanations. This allows learners to study at their own pace and receive support to deepen their understanding.
[0252] As a concrete example, consider a case where an elementary school student is learning arithmetic. When the user encounters a problem and finds it difficult to understand, the device detects a change in facial expression and sends that data to a server. The server generates appropriate explanations and additional practice problems tailored to the situation and provides them to the user through the device. In this way, the user can receive immediate and effective instruction.
[0253] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0254] Step 1:
[0255] The device activates its camera and microphone during the learner's learning session to collect visual and auditory information. This information includes facial expressions and voice tone. The input is the learner's visual and auditory data, which is formatted as a digital signal. The output is data packets ready for transmission to the server. The device sends the collected data to the server immediately via Wi-Fi or a wired connection.
[0256] Step 2:
[0257] The server receives expression and acoustic information transmitted from the terminal. The input is data packets, which are converted into a format for analysis. The server uses machine learning libraries to evaluate the data and determine the learner's emotional state and level of concentration. The output is an analysis result showing the learner's current state.
[0258] Step 3:
[0259] The server retrieves the learner's past learning history from the database based on the analysis results. The input consists of the analyzed emotional state and past learning history. By combining this data, the server evaluates the learner's level of understanding and identifies areas where they are struggling. The output is the evaluation result based on their level of understanding.
[0260] Step 4:
[0261] The server uses a generative AI model to generate personalized explanations and supplementary problems. The input here is the evaluation result obtained in step 3, which is input to the generative AI model as a prompt. A prompt such as "This user lacks basic fraction calculation skills. Please suggest a step-by-step guide and practice problems." is used. The output is explanations and problems optimized for the learner.
[0262] Step 5:
[0263] The terminal presents learners with personalized explanations and supplementary questions received from the server through audio guidance and display. Input is generated content data from the server, which is converted into a user-friendly format. Output is information presented to learners—that is, learning support content presented visually and aurally. This allows users to receive real-time guidance and continue their learning.
[0264] (Application Example 1)
[0265] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0266] In modern education, it is difficult to provide learning experiences optimized for the different levels of understanding and progress of individual learners. Furthermore, the lack of effective support at home necessitates constant supervision by parents and educators. In this context, there is a need for support technologies that enable learners to autonomously, effectively, and individually engage in learning.
[0267] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0268] In this invention, the server includes means for acquiring visual and auditory information of learners in real time, means for analyzing the data acquired by the above functions and identifying mental states related to the learner's level of recognition, and means for generating personalized instruction and practice problems based on the mental states identified by the above functions. This enables the machine to become a learning supporter at home, allowing for effective instruction and provision of materials tailored to the individual learning situation.
[0269] "Visual information" refers to video data used to capture the movements and facial expressions of a subject.
[0270] "Auditory information" refers to audio data used to capture the speech or sounds of a subject.
[0271] "Acquiring in real time" means collecting the target information immediately on the spot.
[0272] "Awareness level" is an indicator that shows how well learners understand a particular subject or issue.
[0273] "Mental state" refers to the psychological and emotional state of a learner.
[0274] "Individualized instruction" means providing individualized educational guidance tailored to each learner's needs and level of understanding.
[0275] "Practice problems" are work assignments provided to learners to solve problems using the knowledge they have acquired.
[0276] "Machines acting as learning supporters in the home" refers to automated devices providing educational guidance in the home.
[0277] "Providing materials" means supplying learners with the information and learning materials they need for their studies.
[0278] The system implementing this invention includes a comprehensive educational support device for providing learning support within the home. The terminal is installed in the home with the learner and uses a built-in camera and microphone to acquire the learner's visual and auditory information in real time. This makes it possible to continuously observe the learner's emotional state and level of comprehension during learning.
[0279] The server receives data sent from the terminal and performs analysis. The server uses analysis algorithms implemented using programming languages such as Python. It utilizes visual information analysis libraries such as OpenCV and face recognition functions such as dlib to identify the learner's mental state. Furthermore, it refers to past learning history and deepens the analysis by combining it with the current mental state.
[0280] A generative AI model operates on the server, generating personalized instructional content and practice problems tailored to the learner's state. This process is implemented using the API of OpenAI, a well-known generative AI model, to address the learner's fluctuating needs in real time. This is where the importance of the "generative AI model and prompt statements" becomes apparent.
[0281] The terminal serves to present the generated content to the learner, providing voice guidance through a speaker and visual presentation via a display. For example, if the learner is stuck on a particular math problem, the system will display several similar problems and offer guidance such as "Try to recall your previous answer."
[0282] As a specific example, an example of a prompt sentence is given: "The learner is struggling with multiplication problems. Please create advice and supplementary problems to re-present problems of that difficulty level from past learning data." In this way, the learner can proceed with learning under autonomous and individualized support within the home.
[0283] The flow of the specific process in Application Example 1 will be described using FIG. 12.
[0284] Step 1:
[0285] The terminal acquires the visual and auditory information of the learner in real time using a camera and a microphone. The inputs are the face image data and voice data of the learner, which are transmitted to the server via wireless communication.
[0286] Step 2:
[0287] The server analyzes the received visual and auditory information. The inputs are the face image data and voice data transmitted from the terminal. OpenCV and dlib libraries are used to analyze the facial expressions and identify the mental state of the learner. Through this operation, the emotional state of the learner is obtained as the output.
[0288] Step 3:
[0289] The server combines the identified emotional state and the past learning history data of the learner for analysis. The inputs are the mental state and past learning data. Based on this, the points where learning gets stuck are identified, and the learning plan is adjusted according to the real-time demand. As the output, individualized educational needs are obtained.
[0290] Step 4:
[0291] The server uses a generative AI model to generate personalized instructional content and practice problems optimized for the learner. The input consists of data stored on the server and prompts generated using the AI model; an example of a prompt is provided using "Generative AI Model, Prompt". New learning content is obtained as output.
[0292] Step 5:
[0293] The device presents the generated instructional content and practice problems to the learner via audio and display. The input is the generated learning content, and the device performs specific actions such as audio guidance via the speaker and visual presentation on the display. Based on this presentation, the learner can proceed with their learning autonomously.
[0294] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0295] This invention provides a learning support system that integrates an emotion engine that accurately identifies the emotional state of learners, thereby offering a learning experience tailored to individual needs. The system consists of a terminal, a server, and the emotion engine, which work together to provide optimal support to learners.
[0296] The terminal is a device used by learners and is equipped with a camera and microphone. This terminal captures the learner's facial expressions and voice in real time and transmits them to the server as digital data.
[0297] The server receives data sent from the terminal and uses an emotion engine to analyze the user's emotional state. The emotion engine incorporates algorithms to recognize different emotional states, which are then used to evaluate the learner's level of understanding and concentration.
[0298] Furthermore, the server references the learner's accumulated past learning history to understand their progress and generates customized explanations and supplementary questions that combine this with their analyzed emotional state. This generated content is best suited to the learner's current situation.
[0299] The device presents content received from the server to the learner. This presentation utilizes both audio guidance and screen displays to convey the information necessary to resolve the difficulties the learner faces.
[0300] As a concrete example, consider a middle school student taking an online history lesson. When the user begins to show signs of confusion due to difficult dates or events, the device detects this change and sends data to the server. The server uses an emotion engine to analyze the degree of confusion and generates visual support and easy-to-remember related questions based on past learning history to aid understanding. As a result, the user gains clues to understand complex dates in chronological order and can re-engage in learning with renewed motivation.
[0301] The following describes the processing flow.
[0302] Step 1:
[0303] The device activates its camera and microphone as soon as the learner starts a lesson, capturing facial and audio data in real time. This information is then converted into a format that can be analyzed by the emotion engine.
[0304] Step 2:
[0305] The terminal sends the captured data to the server. The communication protocol used enables secure and efficient data transfer and is performed in the background without interrupting the learning environment.
[0306] Step 3:
[0307] The server inputs the received data into the emotion engine. The emotion engine uses a machine learning model to analyze the learner's facial expressions and voice characteristics and accurately determine the emotional state.
[0308] Step 4:
[0309] The server compares the analysis results with the learner's past learning history to evaluate the degree of understanding and concentration. This identifies the learner's current stumbling points and areas of insufficient understanding.
[0310] Step 5:
[0311] The server uses generation means to create explanations and supplementary questions tailored to the learner. This includes feedback that raises motivation according to the emotional state and content that attracts interest.
[0312] Step 6:
[0313] The terminal provides the content received from the server to the learner both audibly and visually. The audio guide explains in an intonation that is easy for the learner to understand, and relevant information and questions are displayed on a display such as a tablet.
[0314] Step 7:
[0315] The user deepens their understanding by working on the presented questions. The answer results are sent to the server in real-time through the terminal and are programmed to be reflected in the next learning step.
[0316] Step 8: [[ID=The server stores user response data and reactions to help plan future learning. This data is then used for system-wide re-evaluation and improvement, providing a more optimized experience for learners.
[0318] (Example 2)
[0319] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0320] Conventional learning support systems lack the ability to provide nuanced support based on learners' emotions and learning history, which hinders their ability to maximize learning efficiency. Furthermore, the lack of real-time content delivery that adapts to individual learners' emotional changes results in insufficient efforts to maintain learners' motivation and improve their comprehension.
[0321] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0322] In this invention, the server includes information acquisition means for acquiring learner facial expression information and voice information in real time, emotion analysis means for analyzing the acquired information and identifying the learner's emotional state, and content generation means for generating personalized explanations and questions based on the identified emotional state and the learner's past learning history. This makes it possible to provide optimal educational content that is tailored to the learner's emotions and learning history.
[0323] "Information acquisition means" refers to a device or system that has the function of collecting learner's facial expression information and voice information in real time.
[0324] "Emotional analysis means" refers to a device or system that processes acquired facial expression information and voice information of a learner and has the function of identifying their emotional state.
[0325] "Content generation means" refers to a device or system that has the function of creating personalized explanations and questions based on the identified learner's emotional state and past learning history.
[0326] "Content presentation means" refers to a device or system that has the function of presenting generated explanations and questions to learners.
[0327] An "educational support system" is a system that includes multiple means designed to improve learners' learning efficiency.
[0328] A "generative AI model" is a model that uses machine learning algorithms to dynamically generate personalized explanations and problems in real time.
[0329] This invention aims to provide an educational experience that meets the diverse needs of learners in a learning support system. The invention is implemented in a system centered on a terminal, a server, and an emotion engine composed of these components.
[0330] The terminal is a device used directly by learners and is equipped with a camera and microphone. This allows it to capture real-time facial expression information and audio from the learners and process it as digital data. This processing utilizes a dedicated application within the device, which adds capabilities for facial recognition and audio recording.
[0331] The server receives digital data sent from the terminal and uses an emotion engine to analyze the learner's emotional state. The emotion engine uses machine learning algorithms to identify a variety of emotions. Based on this data, the analysis results are input into a generation AI model, which then generates customized content based on prompts. Specifically, by integrating this with the learner's past learning history, it creates explanations and supplementary problems optimized for their learning situation.
[0332] For example, if a middle school student becomes confused by a difficult date during an online history lesson, the device quickly captures their facial expression and sends it to the server. The server analyzes this confusion using an emotion engine and instructs a generative AI model with a prompt message such as "Generate a concise chart to help understand the dates chronologically." The resulting content is then presented to the user via the device with audio guidance. This allows learners to regain interest in learning and engage with the material more actively.
[0333] This system allows learners to gain a deeper understanding and more active learning at their own pace, and therefore its educational effectiveness is considered to be very high.
[0334] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0335] Step 1:
[0336] The device acquires the learner's facial expression and voice information in real time using a camera and microphone. This input is processed by a dedicated application within the device and converted into digital data format. This processing prepares the data for transmission to the server in an appropriate format for analysis.
[0337] Step 2:
[0338] The terminal sends the processed digital data to the server. The digitized facial expression and voice information, as input, are sent to the server based on a secure communication protocol. Once the server has received the output, it can proceed to the next analysis stage.
[0339] Step 3:
[0340] The server receives data transmitted from the terminal and identifies the learner's emotional state using emotion analysis tools. The input for this step is the facial and voice data received in the previous step. The emotion engine analyzes this data and outputs the identified emotion (e.g., joy, confusion, concentration). This information is used to adapt the learner's learning experience.
[0341] Step 4:
[0342] The server integrates the results of sentiment analysis with past learning history and uses a generative AI model to generate customized learning content. The input for this step is sentiment state and learning history data. The generative AI model uses prompt sentences to generate appropriate explanations and supplementary questions, which are then stored on the server as output.
[0343] Step 5:
[0344] The server sends the generated content to the terminal. The input is the generated learning content, and the output is the completion of the transmission to the terminal. This prepares the learner to receive content optimized for them.
[0345] Step 6:
[0346] The terminal presents content received from the server to the learner. Specifically, content is presented using screen displays and audio guides. The content data, as input, is converted into visual and auditory information, facilitating the learner's understanding as output. This presentation allows learners to actively engage in learning, improving the educational experience.
[0347] (Application Example 2)
[0348] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0349] In today's educational environment, appropriate instruction and support tailored to the individual needs of learners are required. Especially during home learning, it is crucial not only to maintain concentration but also to provide learning content that is appropriate and tailored to each student's level of understanding in real time. However, conventional systems have struggled to accurately grasp learners' emotional states and levels of understanding and to implement optimized instruction based on that information. Improvements are needed to address this challenge.
[0350] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0351] In this invention, the server includes means for acquiring learner perceptual data in real time, means for analyzing the information and identifying emotional states related to the subject's level of understanding, means for generating individualized instruction and support tasks based on the identified emotional states, and a social machine device for supporting children's learning at home. This enables effective learning support tailored to the learner's level of understanding and concentration.
[0352] "Perceptual data" refers to information used to evaluate a learner's emotional state and level of comprehension, including their facial expressions and voice.
[0353] "Analysis means" refers to a device or system that analyzes perceptual data and performs processing to identify the learner's emotional state and level of understanding.
[0354] A "generation means" is a device or system that has the function of creating content and assignments necessary to provide optimal guidance and support to learners based on the analysis results.
[0355] "Presentation means" refers to a device or system that visually or aurally communicates instructional content or assignments created by the generation means to learners.
[0356] "Social mechanized devices" are robots and devices installed to support learning within the home, providing appropriate education through direct interaction with learners.
[0357] This embodiment of the invention operates as a system that provides appropriate educational support based on learner perceptual data. The server receives and analyzes real-time learner facial expression data and voice data transmitted from the terminal.
[0358] This includes hardware such as a Raspberry Pi, camera module, and microphone for processing perceptual data, and software such as an image processing library (OpenCV) and a machine learning library for speech analysis (TensorFlow).
[0359] The server uses analytical tools to identify the learner's emotional state and generates instructional content tailored to their level of understanding. The generated instructional content utilizes a generative AI model. For example, if a learner faces a difficult task and shows signs of distress, the server identifies that emotion and generates appropriate supplementary tasks or visual support, sending them to the device.
[0360] The device presents audio and visually generated learning content and provides learners with immediate feedback. This process is facilitated by a social machine device that supports children's learning within the home.
[0361] As a concrete example of its use, if a third-grade elementary school student is practicing kanji in Japanese class at home and the system detects that the child is hesitating over a difficult reading, it might prompt the child with a question like, "How about taking a nap?" and provide visual assistance to keep the child engaged. Furthermore, the system's generative AI model can use prompts such as, "Imagine a robot that optimally supports your emotions. What kind of learning content would further motivate you to learn?"
[0362] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0363] Step 1:
[0364] The device uses a camera and microphone to acquire learner facial and voice data in real time. This data serves as input for analyzing factors that influence performance and attention. The device then prepares to send the data to a server.
[0365] Step 2:
[0366] The server receives facial expression data and audio data sent from the terminal. Based on the received data, it performs facial expression analysis using OpenCV and audio analysis using TensorFlow. Through this analysis, it identifies the learner's emotional state and outputs information related to their level of comprehension and concentration.
[0367] Step 3:
[0368] Based on the analyzed emotional state and level of understanding, the server uses a generative AI model to generate instructional content and supplementary tasks optimized for the learner. In this process, the generative AI model is input with the prompt "Imagine a robot that optimally supports your emotions. What kind of learning content would further motivate you to learn?", and its output is used as instructional content.
[0369] Step 4:
[0370] The server sends the generated instructional content and supplementary assignments to the terminal. The terminal receives this and presents it to the learner as audio guidance and visual content. At this stage, specific instruction is given to the user, and the system's feedback loop is formed as their reactions are observed.
[0371] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0372] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0373] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0374] [Third Embodiment]
[0375] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0376] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0377] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0378] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0379] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0380] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0381] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0382] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0383] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0384] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0385] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0386] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0387] This invention consists of multiple components to provide flexible learning support tailored to the individual needs of learners. The system can efficiently acquire facial expression and voice data, perform data analysis, generate customized content, and present the content.
[0388] The device operates continuously during the learner's learning session, capturing the learner's face with a camera and their voice with a microphone. This captured data is transmitted to the server via wireless or wired communication.
[0389] The server uses machine learning algorithms to analyze received facial expression and voice data to evaluate the learner's emotional state. The server retrieves the learner's learning history from a database and combines the current state with past data to determine areas of difficulty and the level of understanding.
[0390] The server uses generation methods to create personalized explanations and supplementary problems for learners. This generation is performed in real time using AI technology, with the aim of providing learners with the most effective content.
[0391] The device presents learners with personalized explanations and supplementary problems received from the server. This is done through audio guidance via a speaker and visual presentations via a tablet or other display. This allows learners to immediately understand their difficulties and receive clear guidance for tackling the problems.
[0392] As a concrete example, consider a case where an elementary school student is solving a math problem. When the user gets stuck on a calculation problem, the device detects the change in their facial expression using a sensor and immediately sends the data to the server. The server analyzes the user's past learning data and current facial expression data to understand why the user is having difficulty with that problem. Based on the results, the generation system creates appropriate explanations and supplementary problems, which the device then presents to the user. Through this process, the user can deepen their understanding of the problem and continue to learn effectively.
[0393] The following describes the processing flow.
[0394] Step 1:
[0395] The device powers on, verifies the learner's login, and prepares to begin learning. Once the user logs in, the device's camera and microphone activate, starting to capture the learner's facial expressions and voice in real time.
[0396] Step 2:
[0397] The device sends acquired data to the server at regular intervals. The data includes image frames and audio clips, representing the learner's current state.
[0398] Step 3:
[0399] The server inputs the received data into a machine learning algorithm for analysis. The server then uses this to assess the learner's emotional state (e.g., confusion, interest) and determine if there are any stumbling blocks.
[0400] Step 4:
[0401] The server matches the learner's facial expression data with their past learning history and extracts content from the database that is relevant to their current difficulty.
[0402] Step 5:
[0403] The server generates personalized explanations and supplementary problems. This generation is performed using an AI engine to provide learners with optimal learning support.
[0404] Step 6:
[0405] The terminal presents the learner with explanations and supplementary problems received from the server. The terminal supports the learner by displaying the problems on its screen and providing explanations through audio guidance.
[0406] Step 7:
[0407] Users answer the presented questions and send their feedback back to the server via their device. Once the server receives the user's answers, they are incorporated into their future learning plan.
[0408] Step 8:
[0409] The server re-evaluates the learner's progress based on their responses and answer data, and prepares to adjust the content of the next learning session based on the results.
[0410] (Example 1)
[0411] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0412] Traditional education systems have struggled to grasp each learner's level of understanding and emotional state in real time and provide appropriate learning content immediately based on that information. Therefore, they are unable to provide flexible instruction tailored to each learner's progress, resulting in challenges in providing efficient learning support.
[0413] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0414] In this invention, the server includes means for instantly collecting learner's expression and acoustic information, means for evaluating the collected information to identify emotional states related to comprehension, and means for generating personalized explanations and supplementary problems. This makes it possible to provide learning content that is instantly optimized according to the learner's real-time emotional state and learning history.
[0415] A "learner" is someone who acquires new knowledge and skills in an educational activity.
[0416] "Expressive information" refers to visually observable information such as a learner's facial expressions and attitude, and is data used to judge their emotional state.
[0417] "Acoustic information" refers to auditory observable information, such as the voice and speech patterns emitted by learners, and is used as data to judge emotions and comprehension.
[0418] "Immediate" refers to data collection and processing being performed in real time without delay.
[0419] "Assessment" refers to the process of analyzing collected data to identify learners' levels of understanding and emotional states.
[0420] "Emotional state" refers to the learner's emotional state and reactions, which can be used to assess their motivation and level of concentration.
[0421] "Individualized explanations" refer to learning materials and instructional content that are tailored to the learner's characteristics and current level of understanding.
[0422] "Supplementary problems" refer to additional exercises or assignments provided to reinforce learners' understanding.
[0423] "Artificial intelligence technology" refers to technologies that use machine learning algorithms and data analysis methods to autonomously solve problems or perform tasks.
[0424] This invention is a system designed to provide dynamic learning support to learners, and it mainly consists of a server and terminals.
[0425] The device collects representational and acoustic information using a camera and microphone during the learner's learning process. For example, a standard webcam and microphone can be used. The collected data is transmitted to a server via Wi-Fi or wired communication.
[0426] The server uses Python to leverage machine learning libraries such as TensorFlow and PyTorch to evaluate the received data. This allows it to assess the learner's emotional state, compare it with their learning history retrieved from a database, and generate personalized explanations and supplementary problems.
[0427] The generation process utilizes a generative AI model, allowing learners to receive appropriate content based on prompt input. A concrete example of a prompt is, "This user lacks basic fraction calculation skills. Please suggest a step-by-step guide and practice exercises."
[0428] The device presents the generated personalized content to the learner. A standard speaker is used for audio guidance, while a tablet device is used for visual explanations. This allows learners to study at their own pace and receive support to deepen their understanding.
[0429] As a concrete example, consider a case where an elementary school student is learning arithmetic. When the user encounters a problem and finds it difficult to understand, the device detects a change in facial expression and sends that data to a server. The server generates appropriate explanations and additional practice problems tailored to the situation and provides them to the user through the device. In this way, the user can receive immediate and effective instruction.
[0430] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0431] Step 1:
[0432] The device activates its camera and microphone during the learner's learning session to collect visual and auditory information. This information includes facial expressions and voice tone. The input is the learner's visual and auditory data, which is formatted as a digital signal. The output is data packets ready for transmission to the server. The device sends the collected data to the server immediately via Wi-Fi or a wired connection.
[0433] Step 2:
[0434] The server receives expression and acoustic information transmitted from the terminal. The input is data packets, which are converted into a format for analysis. The server uses machine learning libraries to evaluate the data and determine the learner's emotional state and level of concentration. The output is an analysis result showing the learner's current state.
[0435] Step 3:
[0436] The server retrieves the learner's past learning history from the database based on the analysis results. The input consists of the analyzed emotional state and past learning history. By combining this data, the server evaluates the learner's level of understanding and identifies areas where they are struggling. The output is the evaluation result based on their level of understanding.
[0437] Step 4:
[0438] The server uses a generative AI model to generate personalized explanations and supplementary problems. The input here is the evaluation result obtained in step 3, which is input to the generative AI model as a prompt. A prompt such as "This user lacks basic fraction calculation skills. Please suggest a step-by-step guide and practice problems." is used. The output is explanations and problems optimized for the learner.
[0439] Step 5:
[0440] The terminal presents learners with personalized explanations and supplementary questions received from the server through audio guidance and display. Input is generated content data from the server, which is converted into a user-friendly format. Output is information presented to learners—that is, learning support content presented visually and aurally. This allows users to receive real-time guidance and continue their learning.
[0441] (Application Example 1)
[0442] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0443] In modern education, it is difficult to provide learning experiences optimized for the different levels of understanding and progress of individual learners. Furthermore, the lack of effective support at home necessitates constant supervision by parents and educators. In this context, there is a need for support technologies that enable learners to autonomously, effectively, and individually engage in learning.
[0444] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0445] In this invention, the server includes means for acquiring visual and auditory information of learners in real time, means for analyzing the data acquired by the above functions and identifying mental states related to the learner's level of recognition, and means for generating personalized instruction and practice problems based on the mental states identified by the above functions. This enables the machine to become a learning supporter at home, allowing for effective instruction and provision of materials tailored to the individual learning situation.
[0446] "Visual information" refers to video data used to capture the movements and facial expressions of a subject.
[0447] "Auditory information" refers to audio data used to capture the speech or sounds of a subject.
[0448] "Acquiring in real time" means collecting the target information immediately on the spot.
[0449] "Awareness level" is an indicator that shows how well learners understand a particular subject or issue.
[0450] "Mental state" refers to the psychological and emotional state of a learner.
[0451] "Individualized instruction" means providing individualized educational guidance tailored to each learner's needs and level of understanding.
[0452] "Practice problems" are work assignments provided to learners to solve problems using the knowledge they have acquired.
[0453] "Machines acting as learning supporters in the home" refers to automated devices providing educational guidance in the home.
[0454] "Providing materials" means supplying learners with the information and learning materials they need for their studies.
[0455] The system implementing this invention includes a comprehensive educational support device for providing learning support within the home. The terminal is installed in the home with the learner and uses a built-in camera and microphone to acquire the learner's visual and auditory information in real time. This makes it possible to continuously observe the learner's emotional state and level of comprehension during learning.
[0456] The server receives data sent from the terminal and performs analysis. The server uses analysis algorithms implemented using programming languages such as Python. It utilizes visual information analysis libraries such as OpenCV and face recognition functions such as dlib to identify the learner's mental state. Furthermore, it refers to past learning history and deepens the analysis by combining it with the current mental state.
[0457] A generative AI model operates on the server, generating personalized instructional content and practice problems tailored to the learner's state. This process is implemented using the API of OpenAI, a well-known generative AI model, to address the learner's fluctuating needs in real time. This is where the importance of the "generative AI model and prompt statements" becomes apparent.
[0458] The device serves to present the generated content to the learner, providing audio guidance through the speaker and visual presentations via the display. For example, if a learner gets stuck on a particular math problem, the system will display several similar problems and provide guidance such as, "Try to remember the answer from last time."
[0459] As a concrete example, here is an example of a prompt: "The learner is struggling with multiplication problems. Based on past learning data, create advice and supplementary problems to re-present problems of a similar difficulty level." This allows learners to progress autonomously and with individualized support at home.
[0460] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0461] Step 1:
[0462] The device acquires the learner's visual and auditory information in real time using a camera and microphone. The input consists of the learner's facial image data and audio data, which are transmitted to the server via wireless communication.
[0463] Step 2:
[0464] The server analyzes the received visual and auditory information. The input consists of facial image data and audio data sent from the terminal. Using OpenCV and the dlib library, it analyzes facial expressions and identifies the learner's mental state. This calculation yields the learner's emotional state as output.
[0465] Step 3:
[0466] The server combines identified emotional states with the learner's past learning history data and performs analysis. The input consists of mental states and past learning data; based on this, it identifies learning stumbling blocks and adjusts the learning plan in real time to meet demand. The output is personalized educational needs.
[0467] Step 4:
[0468] The server uses a generative AI model to generate personalized instructional content and practice problems optimized for the learner. The input consists of data stored on the server and prompts generated using the AI model; an example of a prompt is provided using "Generative AI Model, Prompt". New learning content is obtained as output.
[0469] Step 5:
[0470] The device presents the generated instructional content and practice problems to the learner via audio and display. The input is the generated learning content, and the device performs specific actions such as audio guidance via the speaker and visual presentation on the display. Based on this presentation, the learner can proceed with their learning autonomously.
[0471] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0472] This invention provides a learning support system that integrates an emotion engine that accurately identifies the emotional state of learners, thereby offering a learning experience tailored to individual needs. The system consists of a terminal, a server, and the emotion engine, which work together to provide optimal support to learners.
[0473] The terminal is a device used by learners and is equipped with a camera and microphone. This terminal captures the learner's facial expressions and voice in real time and transmits them to the server as digital data.
[0474] The server receives data sent from the terminal and uses an emotion engine to analyze the user's emotional state. The emotion engine incorporates algorithms to recognize different emotional states, which are then used to evaluate the learner's level of understanding and concentration.
[0475] Furthermore, the server references the learner's accumulated past learning history to understand their progress and generates customized explanations and supplementary questions that combine this with their analyzed emotional state. This generated content is best suited to the learner's current situation.
[0476] The device presents content received from the server to the learner. This presentation utilizes both audio guidance and screen displays to convey the information necessary to resolve the difficulties the learner faces.
[0477] As a concrete example, consider a middle school student taking an online history lesson. When the user begins to show signs of confusion due to difficult dates or events, the device detects this change and sends data to the server. The server uses an emotion engine to analyze the degree of confusion and generates visual support and easy-to-remember related questions based on past learning history to aid understanding. As a result, the user gains clues to understand complex dates in chronological order and can re-engage in learning with renewed motivation.
[0478] The following describes the processing flow.
[0479] Step 1:
[0480] The device activates its camera and microphone as soon as the learner starts a lesson, capturing facial and audio data in real time. This information is then converted into a format that can be analyzed by the emotion engine.
[0481] Step 2:
[0482] The device sends the captured data to the server. A communication protocol that enables secure and efficient data transfer is used, and the process takes place in the background without interrupting the learning environment.
[0483] Step 3:
[0484] The server inputs the received data into the emotion engine. The emotion engine uses a machine learning model to analyze the learner's facial expressions and voice characteristics to determine their emotional state with high accuracy.
[0485] Step 4:
[0486] The server compares the analysis results with the learner's past learning history to evaluate their level of understanding and concentration. This identifies the learning difficulties and areas where the learner is currently struggling.
[0487] Step 5:
[0488] The server uses generation tools to create personalized explanations and supplementary questions for learners. This includes motivational feedback tailored to their emotional state and engaging content.
[0489] Step 6:
[0490] The device provides learners with content received from the server, both audibly and visually. The audio guide explains in an easy-to-understand intonation, and displays relevant information and questions on a tablet or other display.
[0491] Step 7:
[0492] Users deepen their understanding by working through the presented problems. The solutions are sent to the server in real time via the device and programmed to be reflected in the next learning step.
[0493] Step 8:
[0494] The server stores user response data and reactions to help plan future learning. This data is then used for system-wide re-evaluation and improvement, providing a more optimized experience for learners.
[0495] (Example 2)
[0496] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0497] Conventional learning support systems lack the ability to provide nuanced support based on learners' emotions and learning history, which hinders their ability to maximize learning efficiency. Furthermore, the lack of real-time content delivery that adapts to individual learners' emotional changes results in insufficient efforts to maintain learners' motivation and improve their comprehension.
[0498] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0499] In this invention, the server includes information acquisition means for acquiring learner facial expression information and voice information in real time, emotion analysis means for analyzing the acquired information and identifying the learner's emotional state, and content generation means for generating personalized explanations and questions based on the identified emotional state and the learner's past learning history. This makes it possible to provide optimal educational content that is tailored to the learner's emotions and learning history.
[0500] "Information acquisition means" refers to a device or system that has the function of collecting learner's facial expression information and voice information in real time.
[0501] "Emotional analysis means" refers to a device or system that processes acquired facial expression information and voice information of a learner and has the function of identifying their emotional state.
[0502] "Content generation means" refers to a device or system that has the function of creating personalized explanations and questions based on the identified learner's emotional state and past learning history.
[0503] "Content presentation means" refers to a device or system that has the function of presenting generated explanations and questions to learners.
[0504] An "educational support system" is a system that includes multiple means designed to improve learners' learning efficiency.
[0505] A "generative AI model" is a model that uses machine learning algorithms to dynamically generate personalized explanations and problems in real time.
[0506] This invention aims to provide an educational experience that meets the diverse needs of learners in a learning support system. The invention is implemented in a system centered on a terminal, a server, and an emotion engine composed of these components.
[0507] The terminal is a device used directly by learners and is equipped with a camera and microphone. This allows it to capture real-time facial expression information and audio from the learners and process it as digital data. This processing utilizes a dedicated application within the device, which adds capabilities for facial recognition and audio recording.
[0508] The server receives digital data sent from the terminal and uses an emotion engine to analyze the learner's emotional state. The emotion engine uses machine learning algorithms to identify a variety of emotions. Based on this data, the analysis results are input into a generation AI model, which then generates customized content based on prompts. Specifically, by integrating this with the learner's past learning history, it creates explanations and supplementary problems optimized for their learning situation.
[0509] For example, if a middle school student becomes confused by a difficult date during an online history lesson, the device quickly captures their facial expression and sends it to the server. The server analyzes this confusion using an emotion engine and instructs a generative AI model with a prompt message such as "Generate a concise chart to help understand the dates chronologically." The resulting content is then presented to the user via the device with audio guidance. This allows learners to regain interest in learning and engage with the material more actively.
[0510] This system allows learners to gain a deeper understanding and more active learning at their own pace, and therefore its educational effectiveness is considered to be very high.
[0511] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0512] Step 1:
[0513] The device acquires the learner's facial expression and voice information in real time using a camera and microphone. This input is processed by a dedicated application within the device and converted into digital data format. This processing prepares the data for transmission to the server in an appropriate format for analysis.
[0514] Step 2:
[0515] The terminal sends the processed digital data to the server. The digitized facial expression and voice information, as input, are sent to the server based on a secure communication protocol. Once the server has received the output, it can proceed to the next analysis stage.
[0516] Step 3:
[0517] The server receives data transmitted from the terminal and identifies the learner's emotional state using emotion analysis tools. The input for this step is the facial and voice data received in the previous step. The emotion engine analyzes this data and outputs the identified emotion (e.g., joy, confusion, concentration). This information is used to adapt the learner's learning experience.
[0518] Step 4:
[0519] The server integrates the results of sentiment analysis with past learning history and uses a generative AI model to generate customized learning content. The input for this step is sentiment state and learning history data. The generative AI model uses prompt sentences to generate appropriate explanations and supplementary questions, which are then stored on the server as output.
[0520] Step 5:
[0521] The server sends the generated content to the terminal. The input is the generated learning content, and the output is the completion of the transmission to the terminal. This prepares the learner to receive content optimized for them.
[0522] Step 6:
[0523] The terminal presents content received from the server to the learner. Specifically, content is presented using screen displays and audio guides. The content data, as input, is converted into visual and auditory information, facilitating the learner's understanding as output. This presentation allows learners to actively engage in learning, improving the educational experience.
[0524] (Application Example 2)
[0525] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0526] In today's educational environment, appropriate instruction and support tailored to the individual needs of learners are required. Especially during home learning, it is crucial not only to maintain concentration but also to provide learning content that is appropriate and tailored to each student's level of understanding in real time. However, conventional systems have struggled to accurately grasp learners' emotional states and levels of understanding and to implement optimized instruction based on that information. Improvements are needed to address this challenge.
[0527] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0528] In this invention, the server includes means for acquiring learner perceptual data in real time, means for analyzing the information and identifying emotional states related to the subject's level of understanding, means for generating individualized instruction and support tasks based on the identified emotional states, and a social machine device for supporting children's learning at home. This enables effective learning support tailored to the learner's level of understanding and concentration.
[0529] "Perceptual data" refers to information used to evaluate a learner's emotional state and level of comprehension, including their facial expressions and voice.
[0530] "Analysis means" refers to a device or system that analyzes perceptual data and performs processing to identify the learner's emotional state and level of understanding.
[0531] A "generation means" is a device or system that has the function of creating content and assignments necessary to provide optimal guidance and support to learners based on the analysis results.
[0532] "Presentation means" refers to a device or system that visually or aurally communicates instructional content or assignments created by the generation means to learners.
[0533] "Social mechanized devices" are robots and devices installed to support learning within the home, providing appropriate education through direct interaction with learners.
[0534] This embodiment of the invention operates as a system that provides appropriate educational support based on learner perceptual data. The server receives and analyzes real-time learner facial expression data and voice data transmitted from the terminal.
[0535] This includes hardware such as a Raspberry Pi, camera module, and microphone for processing perceptual data, and software such as an image processing library (OpenCV) and a machine learning library for speech analysis (TensorFlow).
[0536] The server uses analytical tools to identify the learner's emotional state and generates instructional content tailored to their level of understanding. The generated instructional content utilizes a generative AI model. For example, if a learner faces a difficult task and shows signs of distress, the server identifies that emotion and generates appropriate supplementary tasks or visual support, sending them to the device.
[0537] The device presents audio and visually generated learning content and provides learners with immediate feedback. This process is facilitated by a social machine device that supports children's learning within the home.
[0538] As a concrete example of its use, if a third-grade elementary school student is practicing kanji in Japanese class at home and the system detects that the child is hesitating over a difficult reading, it might prompt the child with a question like, "How about taking a nap?" and provide visual assistance to keep the child engaged. Furthermore, the system's generative AI model can use prompts such as, "Imagine a robot that optimally supports your emotions. What kind of learning content would further motivate you to learn?"
[0539] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0540] Step 1:
[0541] The device uses a camera and microphone to acquire learner facial and voice data in real time. This data serves as input for analyzing factors that influence performance and attention. The device then prepares to send the data to a server.
[0542] Step 2:
[0543] The server receives facial expression data and audio data sent from the terminal. Based on the received data, it performs facial expression analysis using OpenCV and audio analysis using TensorFlow. Through this analysis, it identifies the learner's emotional state and outputs information related to their level of comprehension and concentration.
[0544] Step 3:
[0545] Based on the analyzed emotional state and level of understanding, the server uses a generative AI model to generate instructional content and supplementary tasks optimized for the learner. In this process, the generative AI model is input with the prompt "Imagine a robot that optimally supports your emotions. What kind of learning content would further motivate you to learn?", and its output is used as instructional content.
[0546] Step 4:
[0547] The server sends the generated instructional content and supplementary assignments to the terminal. The terminal receives this and presents it to the learner as audio guidance and visual content. At this stage, specific instruction is given to the user, and the system's feedback loop is formed as their reactions are observed.
[0548] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0549] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0550] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0551] [Fourth Embodiment]
[0552] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0553] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0554] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0555] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0556] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0557] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0558] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0559] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0560] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0561] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0562] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0563] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0564] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0565] This invention consists of multiple components to provide flexible learning support tailored to the individual needs of learners. The system can efficiently acquire facial expression and voice data, perform data analysis, generate customized content, and present the content.
[0566] The device operates continuously during the learner's learning session, capturing the learner's face with a camera and their voice with a microphone. This captured data is transmitted to the server via wireless or wired communication.
[0567] The server uses machine learning algorithms to analyze received facial expression and voice data to evaluate the learner's emotional state. The server retrieves the learner's learning history from a database and combines the current state with past data to determine areas of difficulty and the level of understanding.
[0568] The server uses generation methods to create personalized explanations and supplementary problems for learners. This generation is performed in real time using AI technology, with the aim of providing learners with the most effective content.
[0569] The device presents learners with personalized explanations and supplementary problems received from the server. This is done through audio guidance via a speaker and visual presentations via a tablet or other display. This allows learners to immediately understand their difficulties and receive clear guidance for tackling the problems.
[0570] As a concrete example, consider a case where an elementary school student is solving a math problem. When the user gets stuck on a calculation problem, the device detects the change in their facial expression using a sensor and immediately sends the data to the server. The server analyzes the user's past learning data and current facial expression data to understand why the user is having difficulty with that problem. Based on the results, the generation system creates appropriate explanations and supplementary problems, which the device then presents to the user. Through this process, the user can deepen their understanding of the problem and continue to learn effectively.
[0571] The following describes the processing flow.
[0572] Step 1:
[0573] The device powers on, verifies the learner's login, and prepares to begin learning. Once the user logs in, the device's camera and microphone activate, starting to capture the learner's facial expressions and voice in real time.
[0574] Step 2:
[0575] The device sends acquired data to the server at regular intervals. The data includes image frames and audio clips, representing the learner's current state.
[0576] Step 3:
[0577] The server inputs the received data into a machine learning algorithm for analysis. The server then uses this to assess the learner's emotional state (e.g., confusion, interest) and determine if there are any stumbling blocks.
[0578] Step 4:
[0579] The server matches the learner's facial expression data with their past learning history and extracts content from the database that is relevant to their current difficulty.
[0580] Step 5:
[0581] The server generates personalized explanations and supplementary problems. This generation is performed using an AI engine to provide learners with optimal learning support.
[0582] Step 6:
[0583] The terminal presents the learner with explanations and supplementary problems received from the server. The terminal supports the learner by displaying the problems on its screen and providing explanations through audio guidance.
[0584] Step 7:
[0585] Users answer the presented questions and send their feedback back to the server via their device. Once the server receives the user's answers, they are incorporated into their future learning plan.
[0586] Step 8:
[0587] The server re-evaluates the learner's progress based on their responses and answer data, and prepares to adjust the content of the next learning session based on the results.
[0588] (Example 1)
[0589] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0590] Traditional education systems have struggled to grasp each learner's level of understanding and emotional state in real time and provide appropriate learning content immediately based on that information. Therefore, they are unable to provide flexible instruction tailored to each learner's progress, resulting in challenges in providing efficient learning support.
[0591] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0592] In this invention, the server includes means for instantly collecting learner's expression and acoustic information, means for evaluating the collected information to identify emotional states related to comprehension, and means for generating personalized explanations and supplementary problems. This makes it possible to provide learning content that is instantly optimized according to the learner's real-time emotional state and learning history.
[0593] A "learner" is someone who acquires new knowledge and skills in an educational activity.
[0594] "Expressive information" refers to visually observable information such as a learner's facial expressions and attitude, and is data used to judge their emotional state.
[0595] "Acoustic information" refers to auditory observable information, such as the voice and speech patterns emitted by learners, and is used as data to judge emotions and comprehension.
[0596] "Immediate" refers to data collection and processing being performed in real time without delay.
[0597] "Assessment" refers to the process of analyzing collected data to identify learners' levels of understanding and emotional states.
[0598] "Emotional state" refers to the learner's emotional state and reactions, which can be used to assess their motivation and level of concentration.
[0599] "Individualized explanations" refer to learning materials and instructional content that are tailored to the learner's characteristics and current level of understanding.
[0600] "Supplementary problems" refer to additional exercises or assignments provided to reinforce learners' understanding.
[0601] "Artificial intelligence technology" refers to technologies that use machine learning algorithms and data analysis methods to autonomously solve problems or perform tasks.
[0602] This invention is a system designed to provide dynamic learning support to learners, and it mainly consists of a server and terminals.
[0603] The device collects representational and acoustic information using a camera and microphone during the learner's learning process. For example, a standard webcam and microphone can be used. The collected data is transmitted to a server via Wi-Fi or wired communication.
[0604] The server uses Python to leverage machine learning libraries such as TensorFlow and PyTorch to evaluate the received data. This allows it to assess the learner's emotional state, compare it with their learning history retrieved from a database, and generate personalized explanations and supplementary problems.
[0605] The generation process utilizes a generative AI model, allowing learners to receive appropriate content based on prompt input. A concrete example of a prompt is, "This user lacks basic fraction calculation skills. Please suggest a step-by-step guide and practice exercises."
[0606] The device presents the generated personalized content to the learner. A standard speaker is used for audio guidance, while a tablet device is used for visual explanations. This allows learners to study at their own pace and receive support to deepen their understanding.
[0607] As a concrete example, consider a case where an elementary school student is learning arithmetic. When the user encounters a problem and finds it difficult to understand, the device detects a change in facial expression and sends that data to a server. The server generates appropriate explanations and additional practice problems tailored to the situation and provides them to the user through the device. In this way, the user can receive immediate and effective instruction.
[0608] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0609] Step 1:
[0610] The device activates its camera and microphone during the learner's learning session to collect visual and auditory information. This information includes facial expressions and voice tone. The input is the learner's visual and auditory data, which is formatted as a digital signal. The output is data packets ready for transmission to the server. The device sends the collected data to the server immediately via Wi-Fi or a wired connection.
[0611] Step 2:
[0612] The server receives expression and acoustic information transmitted from the terminal. The input is data packets, which are converted into a format for analysis. The server uses machine learning libraries to evaluate the data and determine the learner's emotional state and level of concentration. The output is an analysis result showing the learner's current state.
[0613] Step 3:
[0614] The server retrieves the learner's past learning history from the database based on the analysis results. The input consists of the analyzed emotional state and past learning history. By combining this data, the server evaluates the learner's level of understanding and identifies areas where they are struggling. The output is the evaluation result based on their level of understanding.
[0615] Step 4:
[0616] The server uses a generative AI model to generate personalized explanations and supplementary problems. The input here is the evaluation result obtained in step 3, which is input to the generative AI model as a prompt. A prompt such as "This user lacks basic fraction calculation skills. Please suggest a step-by-step guide and practice problems." is used. The output is explanations and problems optimized for the learner.
[0617] Step 5:
[0618] The terminal presents learners with personalized explanations and supplementary questions received from the server through audio guidance and display. Input is generated content data from the server, which is converted into a user-friendly format. Output is information presented to learners—that is, learning support content presented visually and aurally. This allows users to receive real-time guidance and continue their learning.
[0619] (Application Example 1)
[0620] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0621] In modern education, it is difficult to provide learning experiences optimized for the different levels of understanding and progress of individual learners. Furthermore, the lack of effective support at home necessitates constant supervision by parents and educators. In this context, there is a need for support technologies that enable learners to autonomously, effectively, and individually engage in learning.
[0622] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0623] In this invention, the server includes means for acquiring visual and auditory information of learners in real time, means for analyzing the data acquired by the above functions and identifying mental states related to the learner's level of recognition, and means for generating personalized instruction and practice problems based on the mental states identified by the above functions. This enables the machine to become a learning supporter at home, allowing for effective instruction and provision of materials tailored to the individual learning situation.
[0624] "Visual information" refers to video data used to capture the movements and facial expressions of a subject.
[0625] "Auditory information" refers to audio data used to capture the speech or sounds of a subject.
[0626] "Acquiring in real time" means collecting the target information immediately on the spot.
[0627] "Awareness level" is an indicator that shows how well learners understand a particular subject or issue.
[0628] "Mental state" refers to the psychological and emotional state of a learner.
[0629] "Individualized instruction" means providing individualized educational guidance tailored to each learner's needs and level of understanding.
[0630] "Practice problems" are work assignments provided to learners to solve problems using the knowledge they have acquired.
[0631] "Machines acting as learning supporters in the home" refers to automated devices providing educational guidance in the home.
[0632] "Providing materials" means supplying learners with the information and learning materials they need for their studies.
[0633] The system implementing this invention includes a comprehensive educational support device for providing learning support within the home. The terminal is installed in the home with the learner and uses a built-in camera and microphone to acquire the learner's visual and auditory information in real time. This makes it possible to continuously observe the learner's emotional state and level of comprehension during learning.
[0634] The server receives data sent from the terminal and performs analysis. The server uses analysis algorithms implemented using programming languages such as Python. It utilizes visual information analysis libraries such as OpenCV and face recognition functions such as dlib to identify the learner's mental state. Furthermore, it refers to past learning history and deepens the analysis by combining it with the current mental state.
[0635] A generative AI model operates on the server, generating personalized instructional content and practice problems tailored to the learner's state. This process is implemented using the API of OpenAI, a well-known generative AI model, to address the learner's fluctuating needs in real time. This is where the importance of the "generative AI model and prompt statements" becomes apparent.
[0636] The device serves to present the generated content to the learner, providing audio guidance through the speaker and visual presentations via the display. For example, if a learner gets stuck on a particular math problem, the system will display several similar problems and provide guidance such as, "Try to remember the answer from last time."
[0637] As a concrete example, here is an example of a prompt: "The learner is struggling with multiplication problems. Based on past learning data, create advice and supplementary problems to re-present problems of a similar difficulty level." This allows learners to progress autonomously and with individualized support at home.
[0638] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0639] Step 1:
[0640] The device acquires the learner's visual and auditory information in real time using a camera and microphone. The input consists of the learner's facial image data and audio data, which are transmitted to the server via wireless communication.
[0641] Step 2:
[0642] The server analyzes the received visual and auditory information. The input consists of facial image data and audio data sent from the terminal. Using OpenCV and the dlib library, it analyzes facial expressions and identifies the learner's mental state. This calculation yields the learner's emotional state as output.
[0643] Step 3:
[0644] The server combines identified emotional states with the learner's past learning history data and performs analysis. The input consists of mental states and past learning data; based on this, it identifies learning stumbling blocks and adjusts the learning plan in real time to meet demand. The output is personalized educational needs.
[0645] Step 4:
[0646] The server uses a generative AI model to generate personalized instructional content and practice problems optimized for the learner. The input consists of data stored on the server and prompts generated using the AI model; an example of a prompt is provided using "Generative AI Model, Prompt". New learning content is obtained as output.
[0647] Step 5:
[0648] The device presents the generated instructional content and practice problems to the learner via audio and display. The input is the generated learning content, and the device performs specific actions such as audio guidance via the speaker and visual presentation on the display. Based on this presentation, the learner can proceed with their learning autonomously.
[0649] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0650] This invention provides a learning support system that integrates an emotion engine that accurately identifies the emotional state of learners, thereby offering a learning experience tailored to individual needs. The system consists of a terminal, a server, and the emotion engine, which work together to provide optimal support to learners.
[0651] The terminal is a device used by learners and is equipped with a camera and microphone. This terminal captures the learner's facial expressions and voice in real time and transmits them to the server as digital data.
[0652] The server receives data sent from the terminal and uses an emotion engine to analyze the user's emotional state. The emotion engine incorporates algorithms to recognize different emotional states, which are then used to evaluate the learner's level of understanding and concentration.
[0653] Furthermore, the server references the learner's accumulated past learning history to understand their progress and generates customized explanations and supplementary questions that combine this with their analyzed emotional state. This generated content is best suited to the learner's current situation.
[0654] The device presents content received from the server to the learner. This presentation utilizes both audio guidance and screen displays to convey the information necessary to resolve the difficulties the learner faces.
[0655] As a concrete example, consider a middle school student taking an online history lesson. When the user begins to show signs of confusion due to difficult dates or events, the device detects this change and sends data to the server. The server uses an emotion engine to analyze the degree of confusion and generates visual support and easy-to-remember related questions based on past learning history to aid understanding. As a result, the user gains clues to understand complex dates in chronological order and can re-engage in learning with renewed motivation.
[0656] The following describes the processing flow.
[0657] Step 1:
[0658] The device activates its camera and microphone as soon as the learner starts a lesson, capturing facial and audio data in real time. This information is then converted into a format that can be analyzed by the emotion engine.
[0659] Step 2:
[0660] The device sends the captured data to the server. A communication protocol that enables secure and efficient data transfer is used, and the process takes place in the background without interrupting the learning environment.
[0661] Step 3:
[0662] The server inputs the received data into the emotion engine. The emotion engine uses a machine learning model to analyze the learner's facial expressions and voice characteristics to determine their emotional state with high accuracy.
[0663] Step 4:
[0664] The server compares the analysis results with the learner's past learning history to evaluate their level of understanding and concentration. This identifies the learning difficulties and areas where the learner is currently struggling.
[0665] Step 5:
[0666] The server uses generation tools to create personalized explanations and supplementary questions for learners. This includes motivational feedback tailored to their emotional state and engaging content.
[0667] Step 6:
[0668] The device provides learners with content received from the server, both audibly and visually. The audio guide explains in an easy-to-understand intonation, and displays relevant information and questions on a tablet or other display.
[0669] Step 7:
[0670] Users deepen their understanding by working through the presented problems. The solutions are sent to the server in real time via the device and programmed to be reflected in the next learning step.
[0671] Step 8:
[0672] The server stores user response data and reactions to help plan future learning. This data is then used for system-wide re-evaluation and improvement, providing a more optimized experience for learners.
[0673] (Example 2)
[0674] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0675] Conventional learning support systems lack the ability to provide nuanced support based on learners' emotions and learning history, which hinders their ability to maximize learning efficiency. Furthermore, the lack of real-time content delivery that adapts to individual learners' emotional changes results in insufficient efforts to maintain learners' motivation and improve their comprehension.
[0676] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0677] In this invention, the server includes information acquisition means for acquiring learner facial expression information and voice information in real time, emotion analysis means for analyzing the acquired information and identifying the learner's emotional state, and content generation means for generating personalized explanations and questions based on the identified emotional state and the learner's past learning history. This makes it possible to provide optimal educational content that is tailored to the learner's emotions and learning history.
[0678] "Information acquisition means" refers to a device or system that has the function of collecting learner's facial expression information and voice information in real time.
[0679] "Emotional analysis means" refers to a device or system that processes acquired facial expression information and voice information of a learner and has the function of identifying their emotional state.
[0680] "Content generation means" refers to a device or system that has the function of creating personalized explanations and questions based on the identified learner's emotional state and past learning history.
[0681] "Content presentation means" refers to a device or system that has the function of presenting generated explanations and questions to learners.
[0682] An "educational support system" is a system that includes multiple means designed to improve learners' learning efficiency.
[0683] A "generative AI model" is a model that uses machine learning algorithms to dynamically generate personalized explanations and problems in real time.
[0684] This invention aims to provide an educational experience that meets the diverse needs of learners in a learning support system. The invention is implemented in a system centered on a terminal, a server, and an emotion engine composed of these components.
[0685] The terminal is a device used directly by learners and is equipped with a camera and microphone. This allows it to capture real-time facial expression information and audio from the learners and process it as digital data. This processing utilizes a dedicated application within the device, which adds capabilities for facial recognition and audio recording.
[0686] The server receives digital data sent from the terminal and uses an emotion engine to analyze the learner's emotional state. The emotion engine uses machine learning algorithms to identify a variety of emotions. Based on this data, the analysis results are input into a generation AI model, which then generates customized content based on prompts. Specifically, by integrating this with the learner's past learning history, it creates explanations and supplementary problems optimized for their learning situation.
[0687] For example, if a middle school student becomes confused by a difficult date during an online history lesson, the device quickly captures their facial expression and sends it to the server. The server analyzes this confusion using an emotion engine and instructs a generative AI model with a prompt message such as "Generate a concise chart to help understand the dates chronologically." The resulting content is then presented to the user via the device with audio guidance. This allows learners to regain interest in learning and engage with the material more actively.
[0688] This system allows learners to gain a deeper understanding and more active learning at their own pace, and therefore its educational effectiveness is considered to be very high.
[0689] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0690] Step 1:
[0691] The device acquires the learner's facial expression and voice information in real time using a camera and microphone. This input is processed by a dedicated application within the device and converted into digital data format. This processing prepares the data for transmission to the server in an appropriate format for analysis.
[0692] Step 2:
[0693] The terminal sends the processed digital data to the server. The digitized facial expression and voice information, as input, are sent to the server based on a secure communication protocol. Once the server has received the output, it can proceed to the next analysis stage.
[0694] Step 3:
[0695] The server receives data transmitted from the terminal and identifies the learner's emotional state using emotion analysis tools. The input for this step is the facial and voice data received in the previous step. The emotion engine analyzes this data and outputs the identified emotion (e.g., joy, confusion, concentration). This information is used to adapt the learner's learning experience.
[0696] Step 4:
[0697] The server integrates the results of sentiment analysis with past learning history and uses a generative AI model to generate customized learning content. The input for this step is sentiment state and learning history data. The generative AI model uses prompt sentences to generate appropriate explanations and supplementary questions, which are then stored on the server as output.
[0698] Step 5:
[0699] The server sends the generated content to the terminal. The input is the generated learning content, and the output is the completion of the transmission to the terminal. This prepares the learner to receive content optimized for them.
[0700] Step 6:
[0701] The terminal presents content received from the server to the learner. Specifically, content is presented using screen displays and audio guides. The content data, as input, is converted into visual and auditory information, facilitating the learner's understanding as output. This presentation allows learners to actively engage in learning, improving the educational experience.
[0702] (Application Example 2)
[0703] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0704] In today's educational environment, appropriate instruction and support tailored to the individual needs of learners are required. Especially during home learning, it is crucial not only to maintain concentration but also to provide learning content that is appropriate and tailored to each student's level of understanding in real time. However, conventional systems have struggled to accurately grasp learners' emotional states and levels of understanding and to implement optimized instruction based on that information. Improvements are needed to address this challenge.
[0705] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0706] In this invention, the server includes means for acquiring learner perceptual data in real time, means for analyzing the information and identifying emotional states related to the subject's level of understanding, means for generating individualized instruction and support tasks based on the identified emotional states, and a social machine device for supporting children's learning at home. This enables effective learning support tailored to the learner's level of understanding and concentration.
[0707] "Perceptual data" refers to information used to evaluate a learner's emotional state and level of comprehension, including their facial expressions and voice.
[0708] "Analysis means" refers to a device or system that analyzes perceptual data and performs processing to identify the learner's emotional state and level of understanding.
[0709] A "generation means" is a device or system that has the function of creating content and assignments necessary to provide optimal guidance and support to learners based on the analysis results.
[0710] "Presentation means" refers to a device or system that visually or aurally communicates instructional content or assignments created by the generation means to learners.
[0711] "Social mechanized devices" are robots and devices installed to support learning within the home, providing appropriate education through direct interaction with learners.
[0712] This embodiment of the invention operates as a system that provides appropriate educational support based on learner perceptual data. The server receives and analyzes real-time learner facial expression data and voice data transmitted from the terminal.
[0713] This includes hardware such as a Raspberry Pi, camera module, and microphone for processing perceptual data, and software such as an image processing library (OpenCV) and a machine learning library for speech analysis (TensorFlow).
[0714] The server uses analytical tools to identify the learner's emotional state and generates instructional content tailored to their level of understanding. The generated instructional content utilizes a generative AI model. For example, if a learner faces a difficult task and shows signs of distress, the server identifies that emotion and generates appropriate supplementary tasks or visual support, sending them to the device.
[0715] The device presents audio and visually generated learning content and provides learners with immediate feedback. This process is facilitated by a social machine device that supports children's learning within the home.
[0716] As a concrete example of its use, if a third-grade elementary school student is practicing kanji in Japanese class at home and the system detects that the child is hesitating over a difficult reading, it might prompt the child with a question like, "How about taking a nap?" and provide visual assistance to keep the child engaged. Furthermore, the system's generative AI model can use prompts such as, "Imagine a robot that optimally supports your emotions. What kind of learning content would further motivate you to learn?"
[0717] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0718] Step 1:
[0719] The device uses a camera and microphone to acquire learner facial and voice data in real time. This data serves as input for analyzing factors that influence performance and attention. The device then prepares to send the data to a server.
[0720] Step 2:
[0721] The server receives facial expression data and audio data sent from the terminal. Based on the received data, it performs facial expression analysis using OpenCV and audio analysis using TensorFlow. Through this analysis, it identifies the learner's emotional state and outputs information related to their level of comprehension and concentration.
[0722] Step 3:
[0723] Based on the analyzed emotional state and level of understanding, the server uses a generative AI model to generate instructional content and supplementary tasks optimized for the learner. In this process, the generative AI model is input with the prompt "Imagine a robot that optimally supports your emotions. What kind of learning content would further motivate you to learn?", and its output is used as instructional content.
[0724] Step 4:
[0725] The server sends the generated instructional content and supplementary assignments to the terminal. The terminal receives this and presents it to the learner as audio guidance and visual content. At this stage, specific instruction is given to the user, and the system's feedback loop is formed as their reactions are observed.
[0726] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0727] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0728] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0729] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0730] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0731] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0732] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0733] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0734] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0735] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0736] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0737] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0738] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0739] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0740] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0741] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0742] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0743] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0744] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0745] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0746] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0747] The following is further disclosed regarding the embodiments described above.
[0748] (Claim 1)
[0749] A means for acquiring learner's facial expression data and voice data in real time,
[0750] An analysis means that analyzes the data acquired by the acquisition means and identifies the emotional state related to the learner's level of understanding,
[0751] A generation means that generates individualized explanations and auxiliary problems based on the emotional state identified by the analysis means,
[0752] A presentation means for presenting the explanations and problems generated by the generation means to the learner,
[0753] A system that includes this.
[0754] (Claim 2)
[0755] The system according to claim 1, wherein the analysis means further includes means for creating an optimized learning plan by referring to the learner's past learning history.
[0756] (Claim 3)
[0757] The system according to claim 1, wherein the generation means includes means for dynamically generating explanatory and auxiliary problems in real time using a machine learning algorithm.
[0758] "Example 1"
[0759] (Claim 1)
[0760] A means of instantly collecting learners' expressive and acoustic information,
[0761] A means for evaluating the information obtained by the aforementioned collection means and identifying emotional states related to the learner's level of understanding,
[0762] A means for generating individualized explanations and supplementary problems based on the emotional state identified by the evaluation means,
[0763] A means for presenting the explanations and problems generated by the generation means to the learner,
[0764] A communication means for coordinating each of the aforementioned means,
[0765] An information processing system that includes this.
[0766] (Claim 2)
[0767] The information processing system according to claim 1, wherein the evaluation means further includes means for formulating an optimized learning plan by referring to the learner's past learning history.
[0768] (Claim 3)
[0769] The information processing system according to claim 1, wherein the generation means includes means for generating explanations and supplementary problems immediately and dynamically using artificial intelligence technology.
[0770] "Application Example 1"
[0771] (Claim 1)
[0772] Features that acquire the learner's visual and auditory information in real time,
[0773] The function analyzes the data obtained by the aforementioned function and identifies the mental state related to the learner's level of understanding,
[0774] A function that generates individualized instruction and practice problems based on the mental state identified by the aforementioned function,
[0775] The function presents the instruction and problems generated by the aforementioned function to the learner,
[0776] The machine acts as a learning supporter within the home, providing optimal learning materials according to the learner's progress,
[0777] A system that includes this.
[0778] (Claim 2)
[0779] The system according to claim 1, further comprising the function of creating an improved learning plan by referring to the learner's past educational records.
[0780] (Claim 3)
[0781] The system according to claim 1, wherein the function includes a function that generates instantly variable instruction and practice problems using machine learning technology.
[0782] "Example 2 of combining an emotion engine"
[0783] (Claim 1)
[0784] Information acquisition means for acquiring learner's facial expression information and voice information in real time,
[0785] An emotion analysis means that analyzes the information acquired by the aforementioned information acquisition means and identifies the learner's emotional state,
[0786] Content generation means that generates personalized explanations and questions based on the emotional state identified by the emotion analysis means and the learner's past learning history,
[0787] Content presentation means for presenting explanations and problems generated by the content generation means to learners,
[0788] An educational support system that includes this.
[0789] (Claim 2)
[0790] The educational support system according to claim 1, wherein the emotion analysis means further includes means for understanding the learner's progress by referring to the learner's past learning history and creating an optimized learning plan.
[0791] (Claim 3)
[0792] The educational support system according to claim 1, wherein the content generation means includes means of using a generative AI model that dynamically generates explanations and problems in real time using a machine learning algorithm.
[0793] "Application example 2 when combining with an emotional engine"
[0794] (Claim 1)
[0795] A means of acquiring learner's perceptual data in real time,
[0796] An analysis means for analyzing the information obtained by the acquisition means and identifying the emotional state related to the subject's level of understanding,
[0797] A generation means that generates individualized guidance and support problems based on the emotional state identified by the analysis means,
[0798] A presentation means for presenting the instruction and assignments generated by the generation means to the learner,
[0799] Social machinery and devices to support children's learning within the home,
[0800] A system that includes this.
[0801] (Claim 2)
[0802] The system according to claim 1, wherein the analysis means further includes means for creating an optimized learning plan by referring to the learner's past learning history.
[0803] (Claim 3)
[0804] The system according to claim 1, wherein the generation means includes means for dynamically generating instructional and support tasks in real time using a machine learning algorithm. [Explanation of Symbols]
[0805] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. Features that acquire the learner's visual and auditory information in real time, The function analyzes the data obtained by the aforementioned function and identifies the mental state related to the learner's level of understanding, A function that generates individualized instruction and practice problems based on the mental state identified by the aforementioned function, The function presents the instruction and problems generated by the aforementioned function to the learner, The machine acts as a learning supporter within the home, providing optimal learning materials according to the learner's progress, A system that includes this.
2. The system according to claim 1, further comprising a function to create an improved learning plan by referring to the learner's past educational records.
3. The system according to claim 1, comprising a function that generates instantly variable instruction and practice problems using machine learning technology.