system
A system for collecting and analyzing student data to infer mental stress levels and notify educators allows for early intervention, addressing the limitations of current countermeasures by enhancing mental health support in educational environments.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-26
AI Technical Summary
Current countermeasures for mental stress in children and students, such as bullying and school refusal, are insufficient as they lack mechanisms for early detection of changes in daily life, making it difficult to accurately grasp mental states and prevent problems.
A system that collects behavioral and learning data from students, extracts features, and uses generative AI to infer emotional states and mental stress levels, notifying educators of anomalies for early intervention.
Enables early detection and reduction of mental stress by providing timely interventions based on continuous data analysis, improving mental health outcomes in educational settings.
Smart Images

Figure 2026105544000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In recent years, problems such as bullying, violence, and school refusal among children and students in the school environment are often caused by the mental stress of children and students, and appropriate responses at appropriate times are required. However, the current countermeasures are limited to human support by school counselors and are insufficient for preventing the occurrence of problems. In the conventional method, since there is no mechanism for early detection of changes in the daily lives of children and students, it is difficult to accurately grasp the mental state of individual children and students and prevent the occurrence of problems.
Means for Solving the Problems
[0006] The term "school environment" refers to the physical and social space in an educational institution where students engage in daily learning and activities.
[0007] "Children and students" refers to children and young people of educational age, and especially includes those attending elementary, middle, and high schools.
[0008] "Behavioral data" refers to information that records the physical movements and interaction patterns of students over time.
[0009] "Learning data" refers to information generated through educational activities conducted by students at school, and specifically includes handwritten text and the content of essays.
[0010] "Feature extraction" refers to the analytical process of finding useful patterns and trends from collected data.
[0011] "Emotional state" refers to the internal reactions and sensations that indicate the psychological state of children and students, and includes satisfaction levels and stress levels.
[0012] "Mental stress level" refers to an index that quantitatively measures the degree of psychological burden and tension experienced by children and students.
[0013] "Detecting anomalies" refers to the process of identifying data patterns that deviate from pre-defined criteria.
[0014] The term "educator" refers to people whose profession involves providing education to children and students, and includes teachers, counselors, and others. [Brief explanation of the drawing]
[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentimental engine is combined.
Embodiments for Carrying Out the Invention
[0016] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0017] First, the terms used in the following description will be explained. [[ID=****]]
[0018] In the following embodiments, a processor with a reference number (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0019] In the following embodiments, a RAM (Random Access Memory) with a reference number is a memory in which information is temporarily stored and is used as a work memory by the processor. \n
[0020] In the following embodiments, a storage with a reference number is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.
[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0023] [First Embodiment]
[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0036] This invention provides a system for early detection of mental stress in students within a school environment and for notifying educators. The system consists of several key components for collecting and analyzing data on students' daily behavior and learning.
[0037] Data collection
[0038] The devices use cameras and sensors installed in classrooms and throughout the school to monitor students' movements and activities in real time and collect behavioral data. They also acquire learning data by collecting information from digital notebooks used during lessons and from paper documents obtained through scanning.
[0039] Data preprocessing and feature extraction
[0040] The server uses OCR technology to read handwritten characters from the collected data as text data. For behavioral data, noise is removed and standardization is performed to improve data integrity. After that, features such as consistency of characters, composition content, and student movement patterns are extracted.
[0041] Analysis and Anomaly Detection
[0042] The server uses generative AI to analyze the extracted features. For example, if patterns such as messy handwriting, frequent use of negative words, or decreased interaction with friends are detected, an increase in mental stress is inferred. Based on this data, the system compares it to a baseline, and if an abnormality is detected, the results are notified to the educator.
[0043] Notifications and feedback
[0044] When an educator receives a notification, they interact with the specific student to confirm the details of their situation. This feedback information is later fed back into the system's learning process to help further improve its accuracy.
[0045] Specific example
[0046] In one case at a school, the system detected that student A's handwriting had recently become messy when taking notes during class. It also observed a change in his behavioral patterns, such as him spending more time alone during recess. The server analyzed this and determined it to be an anomaly, sending an alert to the homeroom teacher (the user of the system). The teacher met with student A and took early action, which resolved the problem. In this way, the system can detect changes in students' mental states early and enable appropriate intervention.
[0047] The following describes the processing flow.
[0048] Step 1:
[0049] The devices use cameras and sensors in the classroom to continuously monitor students' movement patterns and interactions with others, collecting behavioral data. They also acquire learning data from paper documents using digital notebooks and scanning devices.
[0050] Step 2:
[0051] The server converts the acquired training data into text data using OCR technology. At the same time, it applies a noise reduction algorithm to the behavioral data and standardizes it to improve data accuracy.
[0052] Step 3:
[0053] The server extracts features from the organized data. Specifically, it calculates positive and negative evaluations of handwriting sloppiness and words included in essays, and quantifies movement frequency and the presence or absence of group activities from behavioral data.
[0054] Step 4:
[0055] The server inputs the extracted feature data into a generating AI model to estimate the emotional state and mental stress levels of the students. During this process, it compares past and current data to identify any pattern anomalies.
[0056] Step 5:
[0057] If the server determines that an anomaly has been detected based on the analysis results, it immediately compiles information on the relevant students and sends a notification to the user, who is an educator (teacher or counselor).
[0058] Step 6:
[0059] Based on notifications received from the server, users conduct interviews with the relevant students to confirm their statements and take appropriate action. The information obtained and the status of the response are later sent back to the server as feedback and used to improve the accuracy of the learning model.
[0060] (Example 1)
[0061] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0062] In educational settings, there is a challenge in detecting mental stress and psychological burden in students at an early stage. This increases the likelihood that educators may miss opportunities to intervene at the appropriate time, leading to students' mental health being overlooked. In particular, there is a need to continuously observe subtle changes in behavior and learning activities and respond quickly and accurately.
[0063] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0064] In this invention, the server includes a device for collecting user behavioral information and learning information, a device for extracting features from the collected information and inferring the user's emotional state and psychological stress level, and a device for analyzing the data using an artificial intelligence model generated using the extracted features. This makes it possible to detect the user's mental stress early and immediately send notifications to educators.
[0065] "Educational facilities" refer to places where users engage in learning activities, such as schools and cram schools.
[0066] "Users" refers to learners such as children and students who use educational facilities.
[0067] "Behavioral information" refers to data that shows the user's travel routes, physical movements, and activity patterns.
[0068] "Learning information" refers to data related to users' writing and learning activities, specifically handwritten records and essay content.
[0069] "Characteristics" refer to information extracted from collected behavioral and learning data to infer the user's emotional state and psychological stress level.
[0070] A "generated artificial intelligence model" refers to a model that uses machine learning algorithms to analyze data and evaluate the mental state of users.
[0071] "Education staff" refers to individuals who are responsible for guiding and supporting users' learning activities within educational facilities, specifically teachers and counselors.
[0072] "Notifications" refer to alerts and informational messages generated based on anomaly detection and sent to training personnel.
[0073] This invention is a system for detecting mental stress in users at educational facilities at an early stage and notifying educators. A specific embodiment of this system is shown below.
[0074] The terminal is a device that uses cameras and sensors installed in classrooms and educational facilities to monitor users' movements and activities in real time. The terminal collects behavioral information through these devices, and also collects learning information from digital notebooks used during lessons and from paper-based learning information obtained through scans.
[0075] The server uses advanced analytical techniques to process the collected information. Specifically, it uses OCR (Optical Character Recognition) technology to recognize handwritten characters as text data, then removes noise from behavioral information and standardizes the data. The server utilizes a generative AI model to extract user characteristics from the collected information and estimates psychological stress levels based on these characteristics. The estimated results are compared to a standard, and if an anomaly is detected, the educator is immediately notified.
[0076] The user, acting as the educator, receives notifications from the server, interacts with the user, and checks the situation in detail. This allows them to take appropriate action before the user's mental state deteriorates.
[0077] For example, if a particular user's handwritten records become messier than before at an educational facility, or if they start spending more time alone during breaks, the server will detect this as an anomaly and send a notification to the educator. Based on this information, the educator can then interview the user to confirm whether they are experiencing stress or have any problems.
[0078] Example of a prompt:
[0079] "Design a system that analyzes the possibility of mental stress in cases such as messy handwriting in class or an increase in the frequency of children spending time alone during recess, and notifies educators accordingly."
[0080] By using this system, educational institutions can protect the mental health of their users and support smooth educational activities.
[0081] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0082] Step 1:
[0083] The terminal uses cameras and sensors placed within classrooms and educational facilities to monitor users' movements and activities in real time. It takes camera footage and sensor data as input and outputs this as behavioral information. This operation includes analyzing camera footage and collecting movement data from sensors.
[0084] Step 2:
[0085] The device collects learning information from digital notebooks and paper documents obtained through scanning. It takes digital notebook data and scanned images as input and outputs them as learning information. Specific operations include importing digital notebook data and electronically recording paper documents using a scanner.
[0086] Step 3:
[0087] The server receives behavioral and learning information sent from the terminal. The input consists of behavioral and learning data sent from the terminal. The output is a single, consistent dataset composed of this data. Specifically, the server stores the data in a database and prepares for the next processing step.
[0088] Step 4:
[0089] The server uses OCR technology to convert handwritten characters from training data into text data. The input is scanned image data of handwritten characters, and the output is text data. The server applies an image processing algorithm to perform this text conversion.
[0090] Step 5:
[0091] The server performs noise reduction on behavioral data and standardizes the data. It takes real-time monitoring data as input and outputs clean, standardized behavioral data. Specifically, it applies a filtering algorithm to remove noise and converts the data to a standardized scale.
[0092] Step 6:
[0093] The server uses a generative AI model to extract features and evaluate the user's emotional state and mental stress. Inputs include standardized behavioral data and text data, and output is an estimated result of the emotional state and stress level. The server runs the AI model to perform pattern analysis and feature extraction.
[0094] Step 7:
[0095] The server sends a notification to the training supervisor if an anomaly is detected. The input is a prediction result, and the output is a warning notification delivered to the training supervisor. Specific actions include sending emails or messages using the notification system.
[0096] Step 8:
[0097] The user, acting as the educator, receives notifications and confirms details by conducting interviews with the user. Input is notifications from the server. Output is confirmation information regarding the user's status. Specific actions include setting up conversations and conducting interviews.
[0098] (Application Example 1)
[0099] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0100] It is necessary to provide an environment that enables rapid response by continuously monitoring the mental state of the elderly and detecting stress and psychological changes early. However, in current care settings, it is difficult to detect mental stress early, as caregivers and support staff simply wait for awareness. This poses a risk to the psychological health of the elderly.
[0101] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0102] In this invention, the server includes means for collecting individual care recipient activity data and daily life information, means for extracting behavioral characteristics from the collected data and estimating the care recipient's mental state and psychological stress level, and means for detecting abnormalities based on the estimation results and sending notifications to caregivers. This makes it possible to detect the care recipient's mental stress early and take prompt countermeasures.
[0103] "Activity data" refers to information about the movements and behavioral patterns of the person receiving care in their daily life.
[0104] "Daily life information" refers to information including the manual tasks that the person receiving care performs on a daily basis and related records.
[0105] "Mental state" is a concept that describes the emotional and psychological health of the person receiving care.
[0106] "Psychological stress level" is an indicator used to assess the degree of mental burden on a person receiving care.
[0107] "Means of collection" refers to methods and devices used to acquire activity data and information about daily life.
[0108] "Means of extracting features" refer to methods and techniques for analyzing significant patterns and trends from collected data.
[0109] "Methods of prediction" refer to techniques for predicting the mental state and psychological stress levels of care recipients based on data.
[0110] "Means for detecting anomalies" are systems for identifying behaviors or patterns that differ from normal conditions.
[0111] "Means of sending notifications" refers to communication methods used to inform support personnel of detected anomalies.
[0112] The system implementing the present invention provides a method for early detection of mental stress in the daily life of a person receiving care and enabling a rapid response.
[0113] The system primarily consists of a server, terminals, and users. The server collects activity data and daily life information of the person receiving care, and extracts behavioral characteristics based on this data. Activity data is acquired through the camera and sensors of smart glasses worn by the person receiving care. The collected data is transmitted wirelessly to a cloud server (such as Amazon Web Services or Microsoft Azure).
[0114] The server uses OCR technology to analyze digitized handwritten characters and employs generative AI models (such as GPT-4® and BERT) to estimate the care recipient's mental state and psychological stress level. During this process, the data undergoes pre-processing such as noise reduction and standardization. If the estimation reveals any abnormal patterns, the server promptly notifies the user. This notification is sent to the mobile devices of caregivers, family members, and other support personnel to encourage appropriate action.
[0115] As a concrete example, consider a case where an elderly person has recently been enjoying knitting less frequently. In this case, the system can compare hand movement data with daily habits to detect signs of mental stress and quickly notify the caregiver.
[0116] An example of a prompt when using a generative AI model is: "Analyze in detail how the behavioral patterns of a 65-year-old senior have changed over the past week, and infer signs of mental stress or poor health. In particular, consider the frequency and quality of manual tasks."
[0117] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0118] Step 1:
[0119] The device collects activity data from the care recipient's daily life through smart glasses worn by the care recipient. Inputs include image and audio data within the care recipient's field of vision, acquired in real time by sensors. The raw data is then transmitted wirelessly to a cloud server as output.
[0120] Step 2:
[0121] The server preprocesses the received activity data. It receives raw data sent from the terminal as input and performs noise reduction and data standardization. This improves data integrity and outputs clean data converted into a format suitable for analysis.
[0122] Step 3:
[0123] The server extracts features from clean data. Specifically, it analyzes the behavioral patterns and manual movements of the care recipients, and processes data to extract statistically significant features from the input data. As output, it generates a dataset summarizing the behavioral features.
[0124] Step 4:
[0125] The server uses a generative AI model to infer the mental state and psychological stress levels of the person being cared for from the extracted features. In this step, the model receives feature data as input and analyzes it. As output, it generates inferred results of the mental state and stress levels.
[0126] Step 5:
[0127] The server detects anomalies from the prediction results. It determines whether there are any abnormal patterns by comparing them with baseline values and past data. The prediction results are used as input, and if an anomaly is detected, it outputs anomaly information.
[0128] Step 6:
[0129] The server notifies the user of any detected anomalies. It sends alerts to the user (caregiver or family member) on a smartphone or other device. The input is anomaly information, and the output is a detailed alert message.
[0130] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0131] This invention provides a system for comprehensively evaluating the mental state of students in a school environment. This system combines behavioral and learning data collected from students with an emotion recognition engine for further accuracy. This allows for a more accurate estimation of students' mental stress and enables educators to provide appropriate interventions.
[0132] Data collection
[0133] The devices use cameras and microphones installed in classrooms and on campus to observe students' movement patterns and acquire audio and visual information in real time. This method comprehensively collects behavioral and emotional data of students. In parallel, learning data is also collected using digital notebooks and scanning devices.
[0134] Data preprocessing and feature extraction
[0135] The server analyzes the acquired audio and video data and uses speech recognition and facial detection technologies to recognize the user's emotions. This goes beyond simply analyzing behavioral data, extracting emotional characteristics that include contextual background. In addition, OCR technology is used to digitize handwritten text and quantify the content of the text and the writer's tendencies.
[0136] Analysis and Anomaly Detection
[0137] The server inputs this diverse data into a generating AI model, supplementing it with data obtained from the emotion engine to analyze the mental state of the students. By integrating real-time recognition results of emotion data with historical learning and behavioral data, it predicts stress levels and psychological changes with greater accuracy.
[0138] Notifications and feedback
[0139] Educators, as users, receive detailed reports and alerts from the server, enabling them to take prompt action regarding specific students. Results obtained through interviews and observations are fed back to the server, contributing to the improvement of future data analysis and anomaly detection algorithms.
[0140] Specific example
[0141] For example, even if student B appears focused during class, the emotional engine might detect signs of anxiety from changes in tone of voice or subtle facial expressions. If this, along with changes in behavioral and learning data, is deemed abnormal, the server immediately sends a notification to the educator. The teacher can then receive this information, schedule a meeting with student B, and provide support tailored to their individual situation. This system enables a holistic approach that considers not only digital information but also emotional aspects.
[0142] The following describes the processing flow.
[0143] Step 1:
[0144] The devices use cameras and microphones installed in classrooms and throughout the school to collect video and audio data of students in real time. They also utilize a dedicated digital scanning device to obtain learning data from digital notebooks.
[0145] Step 2:
[0146] The server analyzes the collected video data and uses face detection technology to read changes in facial expressions. Simultaneously, it uses speech recognition technology to extract features such as voice tone and speaking speed, and inputs this information into the emotion engine.
[0147] Step 3:
[0148] The server uses OCR to analyze digitized handwritten text, converting the content of essays and notes into text data. This allows for the quantification of linguistic features and handwriting consistency, and helps to understand learning trends.
[0149] Step 4:
[0150] The server integrates emotional data recognized by the emotion engine with behavioral and training data, and analyzes the emotional state and mental stress levels of students based on a model that combines the features of each.
[0151] Step 5:
[0152] Based on the analysis results, the server generates a detailed report and sends an alert to the user, an educator (e.g., a teacher or counselor), if it detects abnormal stress levels or psychological states.
[0153] Step 6:
[0154] Based on the provided reports, users conduct interviews with the relevant students and provide counseling and support as needed. The interview results are then fed back to the server and used to improve the accuracy of the system.
[0155] (Example 2)
[0156] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0157] Traditional systems for assessing the mental health of students in school settings are limited to the analysis of behavioral and learning data, lacking detailed insights into emotional states and the ability to provide appropriate individual support. This makes it difficult for educators to accurately understand students' stress levels, limiting opportunities for appropriate intervention.
[0158] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0159] In this invention, the server includes means for collecting behavioral information and learning outcomes of students in a school environment, means for analyzing emotions and facial expressions from the collected audio and video information, and means for digitizing handwritten character information using OCR technology and quantifying trends. This makes it possible to accurately estimate the mental stress and emotional state of students and to provide real-time notifications to educators.
[0160] "Behavioral information" refers to data on the actions, movements, and interactions that students exhibit within the school environment.
[0161] "Learning outcomes" refer to data that shows the knowledge, skills, and level of achievement of students through school education.
[0162] "Audio information" refers to data that includes acoustic characteristics such as the tone, content, and volume of children's and students' voices.
[0163] "Video information" refers to data that records the actions, facial expressions, and visual characteristics of children and students.
[0164] "Emotion" or "emotional state" refers to the psychological state of a child or student as inferred from their voice and facial expressions.
[0165] "Facial expression" refers to the visual expression of emotions derived from the movement and structure of the facial muscles of children and students.
[0166] "OCR technology" refers to the technology that converts handwritten or printed characters into digital text.
[0167] A "generative AI model" refers to an artificial intelligence model that performs inference and prediction based on diverse data.
[0168] "Sending notifications to educators" means informing educators in real time about any abnormalities or situations requiring attention regarding students.
[0169] The present invention provides a multifaceted evaluation of the mental state and stress levels of students in a school environment. This system collects behavioral information and learning outcomes in real time and analyzes emotional states using a generative AI model, thereby enabling appropriate intervention.
[0170] The devices use cameras and microphones installed in classrooms and on campus to acquire audio and video information of students. This allows for the real-time collection of behavioral information such as students' movements, facial expressions, and tone of voice. In addition, digital notebooks and scanning devices are used to save handwritten learning content as digital data, thereby capturing learning outcomes.
[0171] The server receives data transmitted from the terminal, converts the speech to text using speech recognition technology, and analyzes the characteristics of the voice. Furthermore, it analyzes facial expressions from video data using a face detection algorithm and quantifies the characteristics of emotions. It also digitizes handwritten information using OCR technology and quantifies the content and tendencies of the writing.
[0172] This data is fed into a generative AI model to infer emotional states and mental stress levels. The model integrates real-time and historical data for highly accurate analysis. If an anomaly is detected, the server sends a notification to the educator to prompt immediate intervention.
[0173] A concrete example is when a student appears calm during class, but the emotional engine detects anxiety through changes in tone of voice or subtle facial expressions. In such cases, the system immediately notifies the educator and suggests additional meetings or observations. This allows the educator to provide support tailored to the student's situation.
[0174] An example of a prompt is: "Based on Person A's recent behavior and emotional data, analyze whether there are any psychological changes or stressors, and infer the reasons why." This example serves as a guideline for generative AI models when making inferences based on diverse data.
[0175] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0176] Step 1:
[0177] The devices use cameras and microphones installed in classrooms and on campus to acquire audio and video information of students in real time. Specifically, they record facial expressions and movements through the cameras and collect voice tone and spoken content with the microphones. This input information is sent to a server as behavioral data and audio data.
[0178] Step 2:
[0179] The server processes the audio data received from the terminal through a speech recognition system. Here, the audio is converted to text, and characteristics such as tone and speed are analyzed. The input is audio data, and the output is the transcribed audio information and speech analysis data. This prepares the system for evaluating the emotions of the students.
[0180] Step 3:
[0181] The server analyzes video data by passing it through a face detection algorithm. This detects changes in the facial expressions of students and quantifies their emotional characteristics. The input is video data, and the output is quantified facial expression data. This data is useful for gaining a detailed understanding of emotional states.
[0182] Step 4:
[0183] The server uses OCR technology to convert handwritten text into digital text. The input is scanned handwritten learning output, and the output is the digitized text and its analysis results. This allows for evaluation of learning progress and trends.
[0184] Step 5:
[0185] The server inputs this data into a generative AI model. This model integrates speech analysis data, quantified facial expression data, and learning outcome data to estimate emotional state and mental stress levels. The output is the result of the mental state analysis.
[0186] Step 6:
[0187] The server sends a notification to the educator (user) if an anomaly is detected based on the analysis results. The input is the analysis results of the mental state, and the output is an alert message to the educator. This allows the educator to take necessary interventions.
[0188] Step 7:
[0189] Users receive notifications from the server and consider appropriate responses to students. Specifically, they take actions to improve students' situations, such as scheduling meetings. The input is alerts from the server, and the output is intervention actions taken by educators.
[0190] (Application Example 2)
[0191] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0192] Traditional school environments present a challenge in accurately assessing students' mental stress and emotional states. This can prevent timely intervention by educators, potentially negatively impacting students' psychological health. To address this issue, it is necessary to assess participants' emotional states in real time and provide the necessary support.
[0193] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0194] In this invention, the server includes a device for collecting behavioral and learning information of participants in an educational facility, a device for extracting features from the collected information and inferring the emotional state and mental load level of the participants, and an automated device for providing real-time feedback of the collected information as support to the participants. This makes it possible to quickly grasp the mental state of participants and provide necessary support in real time.
[0195] "Participants in educational facilities" refers to students who participate in activities within the school environment.
[0196] "Behavioral information" refers to information that includes data on participants' travel routes and their interactions with others.
[0197] "Learning information" refers to information that includes data on participants' handwritten text and the content of their written materials.
[0198] "Device" refers to hardware or software used to collect specific information, extract features, or make inferences.
[0199] "Mental stress level" is an evaluation index that indicates the degree of stress and emotional pressure experienced by participants.
[0200] An "automated device that provides real-time feedback" is a device that instantly analyzes collected information and provides situation-based support and alerts to participants and educators.
[0201] This invention is a system for evaluating the mental state of participants in educational facilities and providing necessary support. The system mainly consists of terminals installed within the educational facility, a central control server, and users who are educational providers.
[0202] The terminal uses cameras and microphones installed in designated locations within the facility to collect participant behavioral and learning information. Behavioral information includes participants' movement routes and interaction patterns with others, while learning information includes handwritten text and written content. This data is collected in real time, and all data is transmitted to a server.
[0203] The server analyzes the collected video and audio data using software such as OpenCV and Google® Cloud Speech-to-Text. This process includes facial recognition and voice analysis, which are used to infer the participants' emotional states. Furthermore, generative AI models such as TENSORFLOW® are used to assess the participants' mental stress levels. If any anomalies are detected, the resulting data is immediately communicated to the user, who is the educational provider.
[0204] Based on notifications sent from the server, users can provide appropriate intervention and support to individual participants. For example, if a participant shows signs of distress, the user can detect this and provide individual consultations or psychological support.
[0205] For example, if the server detects that a participant's facial expression is stiff during a lesson, the server will notify the educator of this data in real time. Based on this information, the educator will communicate with the participant and provide the necessary support.
[0206] An example of a prompt for a generative AI model is: "Evaluate changes in emotional state and estimate the level of mental stress based on the participant's behavioral and vocal information."
[0207] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0208] Step 1:
[0209] The terminal uses cameras within the facility to acquire video data of participants and microphones to collect audio data. The input for this step is real-time visual and auditory data obtained from participants, and the output is raw video and audio data to be sent to the server.
[0210] Step 2:
[0211] The server performs face recognition processing on the received video data using OpenCV. The input for this step is the video data sent from the terminal, and the output is the facial feature data of the participants. Specifically, the server detects faces in the video data and quantifies their features.
[0212] Step 3:
[0213] The server converts the audio data into text using the Google Cloud Speech-to-Text API. The input for this step is the collected audio data, and the output is the text data converted from the audio. Specifically, the server sends the audio data to the cloud service and receives it back as text.
[0214] Step 4:
[0215] The server integrates facial feature data and text data and uses a generative AI model to infer the emotional state of the participants. The input for this step is facial feature data and text data converted from speech, and the output is evaluation data based on emotion recognition. The server feeds this data into the AI model to evaluate the participants' emotions and mental stress levels.
[0216] Step 5:
[0217] The server detects anomalies based on emotion recognition results and sends notifications to the user as needed. The input for this step is evaluation data, and the output is alert information for the educator. Specifically, the server searches for anomalies and sends a notification to the user's device if one is found.
[0218] Step 6:
[0219] The user provides support tailored to the participant based on the alert information received. The input for this step is the alert information sent from the server, and the output is the actual intervention activity for the participant. Specifically, the user takes actions such as interacting with the participant or providing counseling.
[0220] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0221] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0222] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0223] [Second Embodiment]
[0224] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0225] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0226] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0227] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0228] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0229] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0230] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0231] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0232] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0233] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0234] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0235] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0236] This invention provides a system for early detection of mental stress in students within a school environment and for notifying educators. The system consists of several key components for collecting and analyzing data on students' daily behavior and learning.
[0237] Data collection
[0238] The devices use cameras and sensors installed in classrooms and throughout the school to monitor students' movements and activities in real time and collect behavioral data. They also acquire learning data by collecting information from digital notebooks used during lessons and from paper documents obtained through scanning.
[0239] Data preprocessing and feature extraction
[0240] The server uses OCR technology to read handwritten characters from the collected data as text data. For behavioral data, noise is removed and standardization is performed to improve data integrity. After that, features such as consistency of characters, composition content, and student movement patterns are extracted.
[0241] Analysis and Anomaly Detection
[0242] The server uses generative AI to analyze the extracted features. For example, if patterns such as messy handwriting, frequent use of negative words, or decreased interaction with friends are detected, an increase in mental stress is inferred. Based on this data, the system compares it to a baseline, and if an abnormality is detected, the results are notified to the educator.
[0243] Notifications and feedback
[0244] When an educator receives a notification, they interact with the specific student to confirm the details of their situation. This feedback information is later fed back into the system's learning process to help further improve its accuracy.
[0245] Specific example
[0246] In one case at a school, the system detected that student A's handwriting had recently become messy when taking notes during class. It also observed a change in his behavioral patterns, such as him spending more time alone during recess. The server analyzed this and determined it to be abnormal, sending an alert to the homeroom teacher (the user of the system). The teacher met with student A and took early action, which resolved the problem. In this way, the system can detect changes in students' mental states early and enable appropriate intervention.
[0247] The following describes the processing flow.
[0248] Step 1:
[0249] The devices use cameras and sensors in the classroom to continuously monitor students' movement patterns and interactions with others, collecting behavioral data. They also acquire learning data from paper documents using digital notebooks and scanning devices.
[0250] Step 2:
[0251] The server converts the acquired training data into text data using OCR technology. At the same time, it applies a noise reduction algorithm to the behavioral data and standardizes it to improve data accuracy.
[0252] Step 3:
[0253] The server extracts features from the organized data. Specifically, it calculates positive and negative evaluations of handwriting sloppiness and words included in essays, and quantifies movement frequency and the presence or absence of group activities from behavioral data.
[0254] Step 4:
[0255] The server inputs the extracted feature data into a generating AI model to estimate the emotional state and mental stress levels of the students. During this process, it compares past and present data to identify any pattern anomalies.
[0256] Step 5:
[0257] If the server determines that an anomaly has been detected based on the analysis results, it immediately compiles information on the relevant students and sends a notification to the user, who is an educator (teacher or counselor).
[0258] Step 6:
[0259] Based on notifications received from the server, users conduct interviews with the relevant students to confirm their statements and take appropriate action. The information obtained and the status of the response are later sent back to the server as feedback and used to improve the accuracy of the learning model.
[0260] (Example 1)
[0261] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0262] In educational settings, there is a challenge in detecting mental stress and psychological burden in students at an early stage. This increases the likelihood that educators may miss opportunities to intervene at the appropriate time, leading to students' mental health being overlooked. In particular, there is a need to continuously observe subtle changes in behavior and learning activities and respond quickly and accurately.
[0263] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0264] In this invention, the server includes a device for collecting user behavioral information and learning information, a device for extracting features from the collected information and inferring the user's emotional state and psychological stress level, and a device for analyzing the data using an artificial intelligence model generated using the extracted features. This makes it possible to detect the user's mental stress early and immediately send notifications to educators.
[0265] "Educational facilities" refer to places where users engage in learning activities, such as schools and cram schools.
[0266] "Users" refers to learners such as children and students who use educational facilities.
[0267] "Behavioral information" refers to data that shows the user's travel routes, physical movements, and activity patterns.
[0268] "Learning information" refers to data related to users' writing and learning activities, specifically handwritten records and essay content.
[0269] "Characteristics" refer to information extracted from collected behavioral and learning data to infer the user's emotional state and psychological stress level.
[0270] A "generated artificial intelligence model" refers to a model that uses machine learning algorithms to analyze data and evaluate the mental state of users.
[0271] An "educational staff member" refers to an individual who is responsible for guiding and supporting the learning activities of users within an educational facility, specifically a teacher or counselor.
[0272] "Notifications" refer to alerts and informational messages generated based on anomaly detection and sent to training personnel.
[0273] This invention is a system for detecting mental stress in users at educational facilities at an early stage and notifying educators. A specific embodiment of this system is shown below.
[0274] The terminal is a device that uses cameras and sensors installed in classrooms and educational facilities to monitor users' movements and activities in real time. The terminal collects behavioral information through these devices, and also collects learning information from digital notebooks used during lessons and from paper-based learning information obtained through scans.
[0275] The server uses advanced analytical techniques to process the collected information. Specifically, it uses OCR (Optical Character Recognition) technology to recognize handwritten characters as text data, then removes noise from behavioral information and standardizes the data. The server utilizes a generative AI model to extract user characteristics from the collected information and estimates psychological stress levels based on these characteristics. The estimated results are compared to a standard, and if an anomaly is detected, the educator is immediately notified.
[0276] The user, acting as the educator, receives notifications from the server, interacts with the user, and checks the situation in detail. This allows them to take appropriate action before the user's mental state deteriorates.
[0277] For example, if a particular user's handwritten records become messier than before at an educational facility, or if they start spending more time alone during breaks, the server will detect this as an anomaly and send a notification to the educator. Based on this information, the educator can then interview the user to confirm whether they are experiencing stress or have any problems.
[0278] Example of a prompt:
[0279] "Design a system that analyzes the possibility of mental stress in cases such as messy handwriting in class or an increase in the frequency of children spending time alone during recess, and notifies educators accordingly."
[0280] By using this system, educational institutions can protect the mental health of their users and support smooth educational activities.
[0281] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0282] Step 1:
[0283] The terminal uses cameras and sensors placed in classrooms and educational facilities to monitor the movements and activities of users in real time. As input, it acquires camera images and data from sensors, and outputs them as action information. This operation includes video analysis of cameras and collection of movement data by sensors.
[0284] Step 2:
[0285] The terminal collects learning information on paper media obtained through digital notes and scans. As input, it inputs data of digital notes and scanned images, and outputs them as learning information. Specific operations include importing data of digital notes and electronically recording paper media with a scanner.
[0286] Step 3:
[0287] The server receives the action information and learning information sent from the terminal. The inputs are the action data and learning data sent from the terminal. The output is that these data become a consistent data set. As a specific operation, the server saves the data in a database and prepares for the next processing.
[0288] Step 4:
[0289] The server uses OCR technology to convert handwritten characters in learning information into text data. The input is the image data of scanned handwritten characters, and the output is the text data. The server applies an image processing algorithm to perform this text conversion.
[0290] Step 5:
[0291] The server performs noise removal on the action information and implements data normalization. The input is the real-time monitoring data, and the output is clean and normalized action data. Specific operations include applying a filtering algorithm to remove noise and converting the data to a standard scale.
[0292] Step 6:
[0293] The server uses a generative AI model to extract features and evaluate the user's emotional state and mental stress. Inputs include standardized behavioral data and text data, and output is an estimated result of the emotional state and stress level. The server runs the AI model to perform pattern analysis and feature extraction.
[0294] Step 7:
[0295] The server sends a notification to the training supervisor if an anomaly is detected. The input is a prediction result, and the output is a warning notification delivered to the training supervisor. Specific actions include sending emails or messages using the notification system.
[0296] Step 8:
[0297] The user, acting as the educator, receives notifications and confirms details by conducting interviews with the user. Input is notifications from the server. Output is confirmation information regarding the user's status. Specific actions include setting up conversations and conducting interviews.
[0298] (Application Example 1)
[0299] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0300] It is necessary to provide an environment that enables rapid response by continuously monitoring the mental state of the elderly and detecting stress and psychological changes early. However, in current care settings, it is difficult to detect mental stress early, as caregivers and support staff simply wait for awareness. This poses a risk to the psychological health of the elderly.
[0301] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0302] In this invention, the server includes means for collecting the activity data and daily life information of individual care recipients, means for extracting the characteristics of actions from the collected data and inferring the mental state and psychological stress level of the care recipients, and means for detecting abnormalities based on the inference results and sending notifications to supporters. Thereby, it becomes possible to detect the mental stress of the care recipients at an early stage and take prompt countermeasures.
[0303] "Activity data" refers to information regarding the movements and action patterns of care recipients in their daily lives.
[0304] "Daily life information" refers to information including the manual work performed by care recipients daily and the records related thereto.
[0305] "Mental state" refers to a concept indicating the state regarding the emotions and psychological health of care recipients.
[0306] "Psychological stress level" refers to an index for evaluating the degree of mental burden of care recipients.
[0307] "Means for collecting" refers to the methods and devices used to obtain activity data and daily life information.
[0308] "Means for extracting characteristics" refers to the methods and technologies for analyzing significant patterns and trends from the collected data.
[0309] "Means for inferring" refers to the technologies for predicting the mental state and psychological stress level of care recipients based on data.
[0310] "Means for detecting abnormalities" refers to a system for identifying different behaviors and patterns compared with the normal state.
[0311] "Means of sending notifications" refers to communication methods used to inform support personnel of detected anomalies.
[0312] The system implementing the present invention provides a method for early detection of mental stress in the daily life of a person receiving care and enabling a rapid response.
[0313] The system primarily consists of a server, terminals, and users. The server collects activity data and daily life information of the person receiving care, and extracts behavioral characteristics based on this data. Activity data is acquired through the camera and sensors of smart glasses worn by the person receiving care. The collected data is transmitted wirelessly to a cloud server (such as Amazon Web Services or Microsoft Azure).
[0314] The server uses OCR technology to analyze digitized handwritten characters and employs generative AI models (such as GPT-4 and BERT) to estimate the care recipient's mental state and psychological stress level. During this process, the data undergoes pre-processing such as noise reduction and standardization. If the estimation reveals any abnormal patterns, the server quickly notifies the user. This notification is sent to the mobile devices of caregivers, family members, and other support personnel to prompt appropriate action.
[0315] As a concrete example, consider a case where an elderly person has recently been enjoying knitting less frequently. In this case, the system can compare hand movement data with daily habits to detect signs of mental stress and quickly notify the caregiver.
[0316] An example of a prompt when using a generative AI model is: "Analyze in detail how the behavioral patterns of a 65-year-old senior have changed over the past week, and infer signs of mental stress or poor health. In particular, consider the frequency and quality of manual tasks."
[0317] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0318] Step 1:
[0319] The device collects activity data from the care recipient's daily life through smart glasses worn by the care recipient. Inputs include image and audio data within the care recipient's field of vision, acquired in real time by sensors. The raw data is then transmitted wirelessly to a cloud server as output.
[0320] Step 2:
[0321] The server preprocesses the received activity data. It receives raw data sent from the terminal as input and performs noise reduction and data standardization. This improves data integrity and outputs clean data converted into a format suitable for analysis.
[0322] Step 3:
[0323] The server extracts features from clean data. Specifically, it analyzes the behavioral patterns and manual movements of the care recipients, and processes data to extract statistically significant features from the input data. As output, it generates a dataset summarizing the behavioral features.
[0324] Step 4:
[0325] The server uses a generative AI model to infer the mental state and psychological stress levels of the person being cared for from the extracted features. In this step, the model receives feature data as input and analyzes it. As output, it generates inferred results of the mental state and stress levels.
[0326] Step 5:
[0327] The server detects anomalies from the prediction results. It determines whether there are any abnormal patterns by comparing them with baseline values and past data. The prediction results are used as input, and if an anomaly is detected, it outputs anomaly information.
[0328] Step 6:
[0329] The server notifies the user of any detected anomalies. It sends alerts to the user (caregiver or family member) on a smartphone or other device. The input is anomaly information, and the output is a detailed alert message.
[0330] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0331] This invention provides a system for comprehensively evaluating the mental state of students in a school environment. This system combines behavioral and learning data collected from students with an emotion recognition engine for further accuracy. This allows for a more accurate estimation of students' mental stress and enables educators to provide appropriate interventions.
[0332] Data collection
[0333] The devices use cameras and microphones installed in classrooms and on campus to observe students' movement patterns and acquire audio and visual information in real time. This method comprehensively collects behavioral and emotional data of students. In parallel, learning data is also collected using digital notebooks and scanning devices.
[0334] Data preprocessing and feature extraction
[0335] The server analyzes the acquired audio and video data and uses speech recognition and facial detection technologies to recognize the user's emotions. This goes beyond simply analyzing behavioral data, extracting emotional characteristics that include contextual background. In addition, OCR technology is used to digitize handwritten text and quantify the content of the text and the writer's tendencies.
[0336] Analysis and Anomaly Detection
[0337] The server inputs this diverse data into a generating AI model, supplementing it with data obtained from the emotion engine to analyze the mental state of the students. By integrating real-time recognition results of emotion data with historical learning and behavioral data, it predicts stress levels and psychological changes with greater accuracy.
[0338] Notifications and feedback
[0339] Educators, as users, receive detailed reports and alerts from the server, enabling them to take prompt action regarding specific students. Results obtained through interviews and observations are fed back to the server, contributing to the improvement of future data analysis and anomaly detection algorithms.
[0340] Specific example
[0341] For example, even if student B appears focused during class, the emotional engine might detect signs of anxiety from changes in tone of voice or subtle facial expressions. If this, along with changes in behavioral and learning data, is deemed abnormal, the server immediately sends a notification to the educator. The teacher can then receive this information, schedule a meeting with student B, and provide support tailored to their individual situation. This system enables a holistic approach that considers not only digital information but also emotional aspects.
[0342] The following describes the processing flow.
[0343] Step 1:
[0344] The devices use cameras and microphones installed in classrooms and throughout the school to collect video and audio data of students in real time. They also utilize a dedicated digital scanning device to acquire learning data from digital notebooks.
[0345] Step 2:
[0346] The server analyzes the collected video data and uses face detection technology to read changes in facial expressions. Simultaneously, it uses speech recognition technology to extract features such as voice tone and speaking speed, and inputs this information into the emotion engine.
[0347] Step 3:
[0348] The server uses OCR to analyze digitized handwritten text, converting the content of essays and notes into text data. This allows for the quantification of linguistic features and handwriting consistency, and helps to understand learning trends.
[0349] Step 4:
[0350] The server integrates emotional data recognized by the emotion engine with behavioral and training data, and analyzes the emotional state and mental stress levels of students based on a model that combines the features of each.
[0351] Step 5:
[0352] Based on the analysis results, the server generates a detailed report and sends an alert to the user, an educator (e.g., a teacher or counselor), if it detects abnormal stress levels or psychological states.
[0353] Step 6:
[0354] Based on the provided reports, users conduct interviews with the relevant students and provide counseling and support as needed. The interview results are then fed back to the server and used to improve the accuracy of the system.
[0355] (Example 2)
[0356] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0357] Traditional systems for assessing the mental health of students in school settings are limited to the analysis of behavioral and learning data, lacking detailed insights into emotional states and the ability to provide appropriate individual support. This makes it difficult for educators to accurately understand students' stress levels, limiting opportunities for appropriate intervention.
[0358] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0359] In this invention, the server includes means for collecting behavioral information and learning outcomes of students in a school environment, means for analyzing emotions and facial expressions from the collected audio and video information, and means for digitizing handwritten character information using OCR technology and quantifying trends. This makes it possible to accurately estimate the mental stress and emotional state of students and to provide real-time notifications to educators.
[0360] "Behavioral information" refers to data on the actions, movements, and interactions that students exhibit within the school environment.
[0361] "Learning outcomes" refer to data that shows the knowledge, skills, and level of achievement of students through school education.
[0362] "Audio information" refers to data that includes acoustic characteristics such as the tone, content, and volume of a student's voice.
[0363] "Video information" refers to data that records the actions, facial expressions, and visual characteristics of children and students.
[0364] "Emotion" or "emotional state" refers to the psychological state of a child or student as inferred from their voice and facial expressions.
[0365] "Facial expression" refers to the visual expression of emotions derived from the movement and structure of the facial muscles of children and students.
[0366] "OCR technology" refers to the technology that converts handwritten or printed characters into digital text.
[0367] A "generative AI model" refers to an artificial intelligence model that performs inference and prediction based on diverse data.
[0368] "Sending notifications to educators" means informing educators in real time about any abnormalities or situations requiring attention regarding students.
[0369] The present invention provides a multifaceted evaluation of the mental state and stress levels of students in a school environment. This system collects behavioral information and learning outcomes in real time and analyzes emotional states using a generative AI model, thereby enabling appropriate intervention.
[0370] The devices use cameras and microphones installed in classrooms and on campus to acquire audio and video information of students. This allows for the real-time collection of behavioral information such as students' movements, facial expressions, and tone of voice. In addition, digital notebooks and scanning devices are used to save handwritten learning content as digital data, thereby capturing learning outcomes.
[0371] The server receives data transmitted from the terminal, converts the speech to text using speech recognition technology, and analyzes the characteristics of the voice. Furthermore, it analyzes facial expressions from video data using a face detection algorithm and quantifies the characteristics of emotions. It also digitizes handwritten information using OCR technology and quantifies the content and tendencies of the writing.
[0372] This data is fed into a generative AI model to infer emotional states and mental stress levels. The model integrates real-time and historical data for highly accurate analysis. If an anomaly is detected, the server sends a notification to the educator to prompt immediate intervention.
[0373] A concrete example is when a student appears calm during class, but the emotional engine detects anxiety through changes in tone of voice or subtle facial expressions. In such cases, the system immediately notifies the educator and suggests additional meetings or observations. This allows the educator to provide support tailored to the student's situation.
[0374] An example of a prompt is: "Based on Person A's recent behavior and emotional data, analyze whether there are any psychological changes or stressors, and infer the reasons why." This example serves as a guideline for generative AI models when making inferences based on diverse data.
[0375] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0376] Step 1:
[0377] The devices use cameras and microphones installed in classrooms and on campus to acquire audio and video information of students in real time. Specifically, they record facial expressions and movements through the cameras and collect voice tone and spoken content with the microphones. This input information is sent to a server as behavioral data and audio data.
[0378] Step 2:
[0379] The server processes the audio data received from the terminal through a speech recognition system. Here, the audio is converted to text, and characteristics such as tone and speed are analyzed. The input is audio data, and the output is the transcribed audio information and speech analysis data. This prepares the system for evaluating the emotions of the students.
[0380] Step 3:
[0381] The server analyzes video data by passing it through a face detection algorithm. This detects changes in the facial expressions of students and quantifies their emotional characteristics. The input is video data, and the output is quantified facial expression data. This data is useful for gaining a detailed understanding of emotional states.
[0382] Step 4:
[0383] The server uses OCR technology to convert handwritten text into digital text. The input is scanned handwritten learning output, and the output is the digitized text and its analysis results. This allows for evaluation of learning progress and trends.
[0384] Step 5:
[0385] The server inputs this data into a generative AI model. This model integrates speech analysis data, quantified facial expression data, and learning outcome data to estimate emotional state and mental stress levels. The output is the result of the mental state analysis.
[0386] Step 6:
[0387] The server sends a notification to the educator (user) if an anomaly is detected based on the analysis results. The input is the analysis results of the mental state, and the output is an alert message to the educator. This allows the educator to take necessary interventions.
[0388] Step 7:
[0389] Users receive notifications from the server and consider appropriate responses to students. Specifically, they take actions to improve students' situations, such as scheduling meetings. The input is alerts from the server, and the output is intervention actions taken by educators.
[0390] (Application Example 2)
[0391] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".
[0392] Traditional school environments present a challenge in accurately assessing students' mental stress and emotional states. This can prevent timely intervention by educators, potentially negatively impacting students' psychological health. To address this issue, it is necessary to assess participants' emotional states in real time and provide the necessary support.
[0393] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0394] In this invention, the server includes a device for collecting behavioral and learning information of participants in an educational facility, a device for extracting features from the collected information and inferring the emotional state and mental load level of the participants, and an automated device for providing real-time feedback of the collected information as support to the participants. This makes it possible to quickly grasp the mental state of participants and provide necessary support in real time.
[0395] "Participants in educational facilities" refers to students who participate in activities within the school environment.
[0396] "Behavioral information" refers to information that includes data on participants' travel routes and their interactions with others.
[0397] "Learning information" refers to information that includes data on participants' handwritten text and the content of their written materials.
[0398] "Device" refers to hardware or software used to collect specific information, extract features, or make inferences.
[0399] "Mental stress level" is an evaluation index that indicates the degree of stress and emotional pressure experienced by participants.
[0400] An "automated device that provides real-time feedback" is a device that instantly analyzes collected information and provides situation-based support and alerts to participants and educators.
[0401] This invention is a system for evaluating the mental state of participants in educational facilities and providing necessary support. The system mainly consists of terminals installed within the educational facility, a central control server, and users who are educational providers.
[0402] The terminal uses cameras and microphones installed in designated locations within the facility to collect participant behavioral and learning information. Behavioral information includes participants' movement routes and interaction patterns with others, while learning information includes handwritten text and written content. This data is collected in real time, and all data is transmitted to a server.
[0403] The server analyzes the collected video and audio data using software such as OpenCV and Google Cloud Speech-to-Text. This analysis includes facial recognition and voice analysis, which are used to infer the participants' emotional states. Furthermore, generative AI models such as TensorFlow are used to assess the participants' mental stress levels. If any anomalies are detected, the resulting data is immediately communicated to the user (the educator).
[0404] Based on notifications sent from the server, users can provide appropriate intervention and support to individual participants. For example, if a participant shows signs of distress, the user can detect this and provide individual consultations or psychological support.
[0405] For example, if the server detects that a participant's facial expression is stiff during a lesson, the server will notify the educator of this data in real time. Based on this information, the educator will communicate with the participant and provide the necessary support.
[0406] An example of a prompt for a generative AI model is: "Evaluate changes in emotional state and estimate the level of mental stress based on the participant's behavioral and vocal information."
[0407] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0408] Step 1:
[0409] The terminal uses cameras within the facility to acquire video data of participants and microphones to collect audio data. The input for this step is real-time visual and auditory data obtained from participants, and the output is raw video and audio data to be sent to the server.
[0410] Step 2:
[0411] The server performs face recognition processing on the received video data using OpenCV. The input for this step is the video data sent from the terminal, and the output is the facial feature data of the participants. Specifically, the server detects faces in the video data and quantifies their features.
[0412] Step 3:
[0413] The server converts the audio data into text using the Google Cloud Speech-to-Text API. The input for this step is the collected audio data, and the output is the text data converted from the audio. Specifically, the server sends the audio data to the cloud service and receives it back as text.
[0414] Step 4:
[0415] The server integrates facial feature data and text data and uses a generative AI model to infer the emotional state of the participants. The input for this step is facial feature data and text data converted from speech, and the output is evaluation data based on emotion recognition. The server feeds this data into the AI model to evaluate the participants' emotions and mental stress levels.
[0416] Step 5:
[0417] The server detects anomalies based on emotion recognition results and sends notifications to the user as needed. The input for this step is evaluation data, and the output is alert information for the educator. Specifically, the server searches for anomalies and sends a notification to the user's device if one is found.
[0418] Step 6:
[0419] The user provides support tailored to the participant based on the alert information received. The input for this step is the alert information sent from the server, and the output is the actual intervention activity for the participant. Specifically, the user takes actions such as interacting with the participant or providing counseling.
[0420] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0421] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0422] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0423] [Third Embodiment]
[0424] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0425] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0426] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0427] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0428] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0429] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0430] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0431] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0432] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0433] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0434] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0435] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0436] This invention provides a system for early detection of mental stress in students within a school environment and for notifying educators. The system consists of several key components for collecting and analyzing data on students' daily behavior and learning.
[0437] Data collection
[0438] The devices use cameras and sensors installed in classrooms and throughout the school to monitor students' movements and activities in real time and collect behavioral data. They also acquire learning data by collecting information from digital notebooks used during lessons and from paper documents obtained through scanning.
[0439] Data preprocessing and feature extraction
[0440] The server uses OCR technology to read handwritten characters from the collected data as text data. For behavioral data, noise is removed and standardization is performed to improve data integrity. After that, features such as consistency of characters, composition content, and student movement patterns are extracted.
[0441] Analysis and Anomaly Detection
[0442] The server uses generative AI to analyze the extracted features. For example, if patterns such as messy handwriting, frequent use of negative words, or decreased interaction with friends are detected, an increase in mental stress is inferred. Based on this data, the system compares it to a baseline, and if an abnormality is detected, the results are notified to the educator.
[0443] Notifications and feedback
[0444] When an educator receives a notification, they interact with the specific student to confirm the details of their situation. This feedback information is later fed back into the system's learning process to help further improve its accuracy.
[0445] Specific example
[0446] In one case at a school, the system detected that student A's handwriting had recently become messy when taking notes during class. It also observed a change in his behavioral patterns, such as him spending more time alone during recess. The server analyzed this and determined it to be abnormal, sending an alert to the homeroom teacher (the user of the system). The teacher met with student A and took early action, which resolved the problem. In this way, the system can detect changes in students' mental states early and enable appropriate intervention.
[0447] The following describes the processing flow.
[0448] Step 1:
[0449] The devices use cameras and sensors in the classroom to continuously monitor students' movement patterns and interactions with others, collecting behavioral data. They also acquire learning data from paper documents using digital notebooks and scanning devices.
[0450] Step 2:
[0451] The server converts the acquired training data into text data using OCR technology. At the same time, it applies a noise reduction algorithm to the behavioral data and standardizes it to improve data accuracy.
[0452] Step 3:
[0453] The server extracts features from the organized data. Specifically, it calculates positive and negative evaluations of handwriting sloppiness and words included in essays, and quantifies movement frequency and the presence or absence of group activities from behavioral data.
[0454] Step 4:
[0455] The server inputs the extracted feature data into a generating AI model to estimate the emotional state and mental stress levels of the students. During this process, it compares past and present data to identify any pattern anomalies.
[0456] Step 5:
[0457] If the server determines that an anomaly has been detected based on the analysis results, it immediately compiles information on the relevant students and sends a notification to the user, who is an educator (teacher or counselor).
[0458] Step 6:
[0459] Based on notifications received from the server, users conduct interviews with the relevant students to confirm their statements and take appropriate action. The information obtained and the status of the response are later sent back to the server as feedback and used to improve the accuracy of the learning model.
[0460] (Example 1)
[0461] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0462] In educational settings, there is a challenge in detecting mental stress and psychological burden in students at an early stage. This increases the likelihood that educators may miss opportunities to intervene at the appropriate time, leading to students' mental health being overlooked. In particular, there is a need to continuously observe subtle changes in behavior and learning activities and respond quickly and accurately.
[0463] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0464] In this invention, the server includes a device for collecting user behavioral information and learning information, a device for extracting features from the collected information and inferring the user's emotional state and psychological stress level, and a device for analyzing the data using an artificial intelligence model generated using the extracted features. This makes it possible to detect the user's mental stress early and immediately send notifications to educators.
[0465] "Educational facilities" refer to places where users engage in learning activities, such as schools and cram schools.
[0466] "Users" refers to learners such as children and students who use educational facilities.
[0467] "Behavioral information" refers to data that shows the user's travel routes, physical movements, and activity patterns.
[0468] "Learning information" refers to data related to users' writing and learning activities, specifically handwritten records and essay content.
[0469] "Characteristics" refer to information extracted from collected behavioral and learning data to infer the user's emotional state and psychological stress level.
[0470] A "generated artificial intelligence model" refers to a model that uses machine learning algorithms to analyze data and evaluate the mental state of users.
[0471] An "educational staff member" refers to an individual who is responsible for guiding and supporting the learning activities of users within an educational facility, specifically a teacher or counselor.
[0472] "Notifications" refer to alerts and informational messages generated based on anomaly detection and sent to training personnel.
[0473] This invention is a system for detecting mental stress in users at educational facilities at an early stage and notifying educators. A specific embodiment of this system is shown below.
[0474] The terminal is a device that uses cameras and sensors installed in classrooms and educational facilities to monitor users' movements and activities in real time. The terminal collects behavioral information through these devices, and also collects learning information from digital notebooks used during lessons and from paper-based learning information obtained through scans.
[0475] The server uses advanced analytical techniques to process the collected information. Specifically, it uses OCR (Optical Character Recognition) technology to recognize handwritten characters as text data, then removes noise from behavioral information and standardizes the data. The server utilizes a generative AI model to extract user characteristics from the collected information and estimates psychological stress levels based on these characteristics. The estimated results are compared to a standard, and if an anomaly is detected, the educator is immediately notified.
[0476] The user, acting as the educator, receives notifications from the server, interacts with the user, and checks the situation in detail. This allows them to take appropriate action before the user's mental state deteriorates.
[0477] For example, if a particular user's handwritten records become messier than before at an educational facility, or if they start spending more time alone during breaks, the server will detect this as an anomaly and send a notification to the educator. Based on this information, the educator can then interview the user to confirm whether they are experiencing stress or have any problems.
[0478] Example of a prompt:
[0479] "Design a system that analyzes the possibility of mental stress in cases such as messy handwriting in class or an increase in the frequency of children spending time alone during recess, and notifies educators accordingly."
[0480] By using this system, educational institutions can protect the mental health of their users and support smooth educational activities.
[0481] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0482] Step 1:
[0483] The terminal uses cameras and sensors placed within classrooms and educational facilities to monitor users' movements and activities in real time. It takes camera footage and sensor data as input and outputs this as behavioral information. This operation includes analyzing camera footage and collecting movement data from sensors.
[0484] Step 2:
[0485] The device collects learning information from digital notebooks and paper documents obtained through scanning. It takes digital notebook data and scanned images as input and outputs them as learning information. Specific operations include importing digital notebook data and electronically recording paper documents using a scanner.
[0486] Step 3:
[0487] The server receives behavioral and learning information sent from the terminal. The input consists of behavioral and learning data sent from the terminal. The output is a single, consistent dataset composed of this data. Specifically, the server stores the data in a database and prepares for the next processing step.
[0488] Step 4:
[0489] The server uses OCR technology to convert handwritten characters from training data into text data. The input is scanned image data of handwritten characters, and the output is text data. The server applies an image processing algorithm to perform this text conversion.
[0490] Step 5:
[0491] The server performs noise reduction on behavioral data and standardizes the data. It takes real-time monitoring data as input and outputs clean, standardized behavioral data. Specifically, it applies a filtering algorithm to remove noise and converts the data to a standardized scale.
[0492] Step 6:
[0493] The server uses a generative AI model to extract features and evaluate the user's emotional state and mental stress. Inputs include standardized behavioral data and text data, and output is an estimated result of the emotional state and stress level. The server runs the AI model to perform pattern analysis and feature extraction.
[0494] Step 7:
[0495] The server sends a notification to the training supervisor if an anomaly is detected. The input is a prediction result, and the output is a warning notification delivered to the training supervisor. Specific actions include sending emails or messages using the notification system.
[0496] Step 8:
[0497] The user, acting as the educator, receives notifications and confirms details by conducting interviews with the user. Input is notifications from the server. Output is confirmation information regarding the user's status. Specific actions include setting up conversations and conducting interviews.
[0498] (Application Example 1)
[0499] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0500] It is necessary to provide an environment that enables rapid response by continuously monitoring the mental state of the elderly and detecting stress and psychological changes early. However, in current care settings, it is difficult to detect mental stress early, as caregivers and support staff simply wait for awareness. This poses a risk to the psychological health of the elderly.
[0501] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0502] In this invention, the server includes means for collecting individual care recipient activity data and daily life information, means for extracting behavioral characteristics from the collected data and estimating the care recipient's mental state and psychological stress level, and means for detecting abnormalities based on the estimation results and sending notifications to caregivers. This makes it possible to detect the care recipient's mental stress early and take prompt countermeasures.
[0503] "Activity data" refers to information about the movements and behavioral patterns of the person receiving care in their daily life.
[0504] "Daily life information" refers to information including the manual tasks that the person receiving care performs on a daily basis and related records.
[0505] "Mental state" is a concept that describes the emotional and psychological health of the person receiving care.
[0506] "Psychological stress level" is an indicator used to assess the degree of mental burden on a person receiving care.
[0507] "Means of collection" refers to methods and devices used to acquire activity data and information about daily life.
[0508] "Means of extracting features" refer to methods and techniques for analyzing significant patterns and trends from collected data.
[0509] "Methods of prediction" refer to techniques for predicting the mental state and psychological stress levels of care recipients based on data.
[0510] "Means for detecting anomalies" are systems for identifying behaviors or patterns that differ from normal conditions.
[0511] "Means of sending notifications" refers to communication methods used to inform support personnel of detected anomalies.
[0512] The system implementing the present invention provides a method for early detection of mental stress in the daily life of a person receiving care and enabling a rapid response.
[0513] The system primarily consists of a server, terminals, and users. The server collects activity data and daily life information of the person receiving care, and extracts behavioral characteristics based on this data. Activity data is acquired through the camera and sensors of smart glasses worn by the person receiving care. The collected data is transmitted wirelessly to a cloud server (such as Amazon Web Services or Microsoft Azure).
[0514] The server uses OCR technology to analyze digitized handwritten characters and employs generative AI models (such as GPT-4 and BERT) to estimate the care recipient's mental state and psychological stress level. During this process, the data undergoes pre-processing such as noise reduction and standardization. If the estimation reveals any abnormal patterns, the server quickly notifies the user. This notification is sent to the mobile devices of caregivers, family members, and other support personnel to prompt appropriate action.
[0515] As a concrete example, consider a case where an elderly person has recently been enjoying knitting less frequently. In this case, the system can compare hand movement data with daily habits to detect signs of mental stress and quickly notify the caregiver.
[0516] An example of a prompt when using a generative AI model is: "Analyze in detail how the behavioral patterns of a 65-year-old senior have changed over the past week, and infer signs of mental stress or poor health. In particular, consider the frequency and quality of manual tasks."
[0517] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0518] Step 1:
[0519] The device collects activity data from the care recipient's daily life through smart glasses worn by the care recipient. Inputs include image and audio data within the care recipient's field of vision, acquired in real time by sensors. The raw data is then transmitted wirelessly to a cloud server as output.
[0520] Step 2:
[0521] The server preprocesses the received activity data. It receives raw data sent from the terminal as input and performs noise reduction and data standardization. This improves data integrity and outputs clean data converted into a format suitable for analysis.
[0522] Step 3:
[0523] The server extracts features from clean data. Specifically, it analyzes the behavioral patterns and manual movements of the care recipients, and processes data to extract statistically significant features from the input data. As output, it generates a dataset summarizing the behavioral features.
[0524] Step 4:
[0525] The server uses a generative AI model to infer the mental state and psychological stress levels of the person being cared for from the extracted features. In this step, the model receives feature data as input and analyzes it. As output, it generates inferred results of the mental state and stress levels.
[0526] Step 5:
[0527] The server detects anomalies from the prediction results. It determines whether there are any abnormal patterns by comparing them with baseline values and past data. The prediction results are used as input, and if an anomaly is detected, it outputs anomaly information.
[0528] Step 6:
[0529] The server notifies the user of any detected anomalies. It sends alerts to the user (caregiver or family member) on a smartphone or other device. The input is anomaly information, and the output is a detailed alert message.
[0530] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0531] This invention provides a system for comprehensively evaluating the mental state of students in a school environment. This system combines behavioral and learning data collected from students with an emotion recognition engine for further accuracy. This allows for a more accurate estimation of students' mental stress and enables educators to provide appropriate interventions.
[0532] Data collection
[0533] The devices use cameras and microphones installed in classrooms and on campus to observe students' movement patterns and acquire audio and visual information in real time. This method comprehensively collects behavioral and emotional data of students. In parallel, learning data is also collected using digital notebooks and scanning devices.
[0534] Data preprocessing and feature extraction
[0535] The server analyzes the acquired audio and video data and uses speech recognition and facial detection technologies to recognize the user's emotions. This goes beyond simply analyzing behavioral data, extracting emotional characteristics that include contextual background. In addition, OCR technology is used to digitize handwritten text and quantify the content of the text and the writer's tendencies.
[0536] Analysis and Anomaly Detection
[0537] The server inputs this diverse data into a generating AI model, supplementing it with data obtained from the emotion engine to analyze the mental state of the students. By integrating real-time recognition results of emotion data with historical learning and behavioral data, it predicts stress levels and psychological changes with greater accuracy.
[0538] Notifications and feedback
[0539] Educators, as users, receive detailed reports and alerts from the server, enabling them to take prompt action regarding specific students. Results obtained through interviews and observations are fed back to the server, contributing to the improvement of future data analysis and anomaly detection algorithms.
[0540] Specific example
[0541] For example, even if student B appears focused during class, the emotional engine might detect signs of anxiety from changes in tone of voice or subtle facial expressions. If this, along with changes in behavioral and learning data, is deemed abnormal, the server immediately sends a notification to the educator. The teacher can then receive this information, schedule a meeting with student B, and provide support tailored to their individual situation. This system enables a holistic approach that considers not only digital information but also emotional aspects.
[0542] The following describes the processing flow.
[0543] Step 1:
[0544] The devices use cameras and microphones installed in classrooms and throughout the school to collect video and audio data of students in real time. They also utilize a dedicated digital scanning device to acquire learning data from digital notebooks.
[0545] Step 2:
[0546] The server analyzes the collected video data and uses face detection technology to read changes in facial expressions. Simultaneously, it uses speech recognition technology to extract features such as voice tone and speaking speed, and inputs this information into the emotion engine.
[0547] Step 3:
[0548] The server uses OCR to analyze digitized handwritten text, converting the content of essays and notes into text data. This allows for the quantification of linguistic features and handwriting consistency, and helps to understand learning trends.
[0549] Step 4:
[0550] The server integrates emotional data recognized by the emotion engine with behavioral and training data, and analyzes the emotional state and mental stress levels of students based on a model that combines the features of each.
[0551] Step 5:
[0552] Based on the analysis results, the server generates a detailed report and sends an alert to the user, an educator (e.g., a teacher or counselor), if it detects abnormal stress levels or psychological states.
[0553] Step 6:
[0554] Based on the provided reports, users conduct interviews with the relevant students and provide counseling and support as needed. The interview results are then fed back to the server and used to improve the accuracy of the system.
[0555] (Example 2)
[0556] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0557] Traditional systems for assessing the mental health of students in school settings are limited to the analysis of behavioral and learning data, lacking detailed insights into emotional states and the ability to provide appropriate individual support. This makes it difficult for educators to accurately understand students' stress levels, limiting opportunities for appropriate intervention.
[0558] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0559] In this invention, the server includes means for collecting behavioral information and learning outcomes of students in a school environment, means for analyzing emotions and facial expressions from the collected audio and video information, and means for digitizing handwritten character information using OCR technology and quantifying trends. This makes it possible to accurately estimate the mental stress and emotional state of students and to provide real-time notifications to educators.
[0560] "Behavioral information" refers to data on the actions, movements, and interactions that students exhibit within the school environment.
[0561] "Learning outcomes" refer to data that shows the knowledge, skills, and level of achievement of students through school education.
[0562] "Audio information" refers to data that includes acoustic characteristics such as the tone, content, and volume of a student's voice.
[0563] "Video information" refers to data that records the actions, facial expressions, and visual characteristics of children and students.
[0564] "Emotion" or "emotional state" refers to the psychological state of a child or student as inferred from their voice and facial expressions.
[0565] "Facial expression" refers to the visual expression of emotions derived from the movement and structure of the facial muscles of children and students.
[0566] "OCR technology" refers to the technology that converts handwritten or printed characters into digital text.
[0567] A "generative AI model" refers to an artificial intelligence model that performs inference and prediction based on diverse data.
[0568] "Sending notifications to educators" means informing educators in real time about any abnormalities or situations requiring attention regarding students.
[0569] The present invention provides a multifaceted evaluation of the mental state and stress levels of students in a school environment. This system collects behavioral information and learning outcomes in real time and analyzes emotional states using a generative AI model, thereby enabling appropriate intervention.
[0570] The devices use cameras and microphones installed in classrooms and on campus to acquire audio and video information of students. This allows for the real-time collection of behavioral information such as students' movements, facial expressions, and tone of voice. In addition, digital notebooks and scanning devices are used to save handwritten learning content as digital data, thereby capturing learning outcomes.
[0571] The server receives data transmitted from the terminal, converts the speech to text using speech recognition technology, and analyzes the characteristics of the voice. Furthermore, it analyzes facial expressions from video data using a face detection algorithm and quantifies the characteristics of emotions. It also digitizes handwritten information using OCR technology and quantifies the content and tendencies of the writing.
[0572] This data is fed into a generative AI model to infer emotional states and mental stress levels. The model integrates real-time and historical data for highly accurate analysis. If an anomaly is detected, the server sends a notification to the educator to prompt immediate intervention.
[0573] A concrete example is when a student appears calm during class, but the emotional engine detects anxiety through changes in tone of voice or subtle facial expressions. In such cases, the system immediately notifies the educator and suggests additional meetings or observations. This allows the educator to provide support tailored to the student's situation.
[0574] An example of a prompt is: "Based on Person A's recent behavior and emotional data, analyze whether there are any psychological changes or stressors, and infer the reasons why." This example serves as a guideline for generative AI models when making inferences based on diverse data.
[0575] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0576] Step 1:
[0577] The devices use cameras and microphones installed in classrooms and on campus to acquire audio and video information of students in real time. Specifically, they record facial expressions and movements through the cameras and collect voice tone and spoken content with the microphones. This input information is sent to a server as behavioral data and audio data.
[0578] Step 2:
[0579] The server processes the audio data received from the terminal through a speech recognition system. Here, the audio is converted to text, and characteristics such as tone and speed are analyzed. The input is audio data, and the output is the transcribed audio information and speech analysis data. This prepares the system for evaluating the emotions of the students.
[0580] Step 3:
[0581] The server analyzes video data by passing it through a face detection algorithm. This detects changes in the facial expressions of students and quantifies their emotional characteristics. The input is video data, and the output is quantified facial expression data. This data is useful for gaining a detailed understanding of emotional states.
[0582] Step 4:
[0583] The server uses OCR technology to convert handwritten text into digital text. The input is scanned handwritten learning output, and the output is the digitized text and its analysis results. This allows for evaluation of learning progress and trends.
[0584] Step 5:
[0585] The server inputs this data into a generative AI model. This model integrates speech analysis data, quantified facial expression data, and learning outcome data to estimate emotional state and mental stress levels. The output is the result of the mental state analysis.
[0586] Step 6:
[0587] The server sends a notification to the educator (user) if an anomaly is detected based on the analysis results. The input is the analysis results of the mental state, and the output is an alert message to the educator. This allows the educator to take necessary interventions.
[0588] Step 7:
[0589] Users receive notifications from the server and consider appropriate responses to students. Specifically, they take actions to improve students' situations, such as scheduling meetings. The input is alerts from the server, and the output is intervention actions taken by educators.
[0590] (Application Example 2)
[0591] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0592] Traditional school environments present a challenge in accurately assessing students' mental stress and emotional states. This can prevent timely intervention by educators, potentially negatively impacting students' psychological health. To address this issue, it is necessary to assess participants' emotional states in real time and provide the necessary support.
[0593] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0594] In this invention, the server includes a device for collecting behavioral and learning information of participants in an educational facility, a device for extracting features from the collected information and inferring the emotional state and mental load level of the participants, and an automated device for providing real-time feedback of the collected information as support to the participants. This makes it possible to quickly grasp the mental state of participants and provide necessary support in real time.
[0595] "Participants in educational facilities" refers to students who participate in activities within the school environment.
[0596] "Behavioral information" refers to information that includes data on participants' travel routes and their interactions with others.
[0597] "Learning information" refers to information that includes data on participants' handwritten text and the content of their written materials.
[0598] "Device" refers to hardware or software used to collect specific information, extract features, or make inferences.
[0599] "Mental stress level" is an evaluation index that indicates the degree of stress and emotional pressure experienced by participants.
[0600] An "automated device that provides real-time feedback" is a device that instantly analyzes collected information and provides situation-based support and alerts to participants and educators.
[0601] This invention is a system for evaluating the mental state of participants in educational facilities and providing necessary support. The system mainly consists of terminals installed within the educational facility, a central control server, and users who are educational providers.
[0602] The terminal uses cameras and microphones installed in designated locations within the facility to collect participant behavioral and learning information. Behavioral information includes participants' movement routes and interaction patterns with others, while learning information includes handwritten text and written content. This data is collected in real time, and all data is transmitted to a server.
[0603] The server analyzes the collected video and audio data using software such as OpenCV and Google Cloud Speech-to-Text. This analysis includes facial recognition and voice analysis, which are used to infer the participants' emotional states. Furthermore, generative AI models such as TensorFlow are used to assess the participants' mental stress levels. If any anomalies are detected, the resulting data is immediately communicated to the user (the educator).
[0604] Based on notifications sent from the server, users can provide appropriate intervention and support to individual participants. For example, if a participant shows signs of distress, the user can detect this and provide individual consultations or psychological support.
[0605] For example, if the server detects that a participant's facial expression is stiff during a lesson, the server will notify the educator of this data in real time. Based on this information, the educator will communicate with the participant and provide the necessary support.
[0606] An example of a prompt for a generative AI model is: "Evaluate changes in emotional state and estimate the level of mental stress based on the participant's behavioral and vocal information."
[0607] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0608] Step 1:
[0609] The terminal uses cameras within the facility to acquire video data of participants and microphones to collect audio data. The input for this step is real-time visual and auditory data obtained from participants, and the output is raw video and audio data to be sent to the server.
[0610] Step 2:
[0611] The server performs face recognition processing on the received video data using OpenCV. The input for this step is the video data sent from the terminal, and the output is the facial feature data of the participants. Specifically, the server detects faces in the video data and quantifies their features.
[0612] Step 3:
[0613] The server converts the audio data into text using the Google Cloud Speech-to-Text API. The input for this step is the collected audio data, and the output is the text data converted from the audio. Specifically, the server sends the audio data to the cloud service and receives it back as text.
[0614] Step 4:
[0615] The server integrates facial feature data and text data and uses a generative AI model to infer the emotional state of the participants. The input for this step is facial feature data and text data converted from speech, and the output is evaluation data based on emotion recognition. The server feeds this data into the AI model to evaluate the participants' emotions and mental stress levels.
[0616] Step 5:
[0617] The server detects anomalies based on emotion recognition results and sends notifications to the user as needed. The input for this step is evaluation data, and the output is alert information for the educator. Specifically, the server searches for anomalies and sends a notification to the user's device if one is found.
[0618] Step 6:
[0619] The user provides support tailored to the participant based on the alert information received. The input for this step is the alert information sent from the server, and the output is the actual intervention activity for the participant. Specifically, the user takes actions such as interacting with the participant or providing counseling.
[0620] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0621] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0622] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0623] [Fourth Embodiment]
[0624] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0625] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0626] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0627] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0628] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0629] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0630] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0631] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0632] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0633] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0634] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0635] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0636] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0637] This invention provides a system for early detection of mental stress in students within a school environment and for notifying educators. The system consists of several key components for collecting and analyzing data on students' daily behavior and learning.
[0638] Data collection
[0639] The devices use cameras and sensors installed in classrooms and throughout the school to monitor students' movements and activities in real time and collect behavioral data. They also acquire learning data by collecting information from digital notebooks used during lessons and from paper documents obtained through scanning.
[0640] Data preprocessing and feature extraction
[0641] The server uses OCR technology to read handwritten characters from the collected data as text data. For behavioral data, noise is removed and standardization is performed to improve data integrity. After that, features such as consistency of characters, composition content, and student movement patterns are extracted.
[0642] Analysis and Anomaly Detection
[0643] The server uses generative AI to analyze the extracted features. For example, if patterns such as messy handwriting, frequent use of negative words, or decreased interaction with friends are detected, an increase in mental stress is inferred. Based on this data, the system compares it to a baseline, and if an abnormality is detected, the results are notified to the educator.
[0644] Notifications and feedback
[0645] When an educator receives a notification, they interact with the specific student to confirm the details of their situation. This feedback information is later fed back into the system's learning process to help further improve its accuracy.
[0646] Specific example
[0647] In one case at a school, the system detected that student A's handwriting had recently become messy when taking notes during class. It also observed a change in his behavioral patterns, such as him spending more time alone during recess. The server analyzed this and determined it to be abnormal, sending an alert to the homeroom teacher (the user of the system). The teacher met with student A and took early action, which resolved the problem. In this way, the system can detect changes in students' mental states early and enable appropriate intervention.
[0648] The following describes the processing flow.
[0649] Step 1:
[0650] The devices use cameras and sensors in the classroom to continuously monitor students' movement patterns and interactions with others, collecting behavioral data. They also acquire learning data from paper documents using digital notebooks and scanning devices.
[0651] Step 2:
[0652] The server converts the acquired training data into text data using OCR technology. At the same time, it applies a noise reduction algorithm to the behavioral data and standardizes it to improve data accuracy.
[0653] Step 3:
[0654] The server extracts features from the organized data. Specifically, it calculates positive and negative evaluations of handwriting sloppiness and words included in essays, and quantifies movement frequency and the presence or absence of group activities from behavioral data.
[0655] Step 4:
[0656] The server inputs the extracted feature data into a generating AI model to estimate the emotional state and mental stress levels of the students. During this process, it compares past and present data to identify any pattern anomalies.
[0657] Step 5:
[0658] If the server determines that an anomaly has been detected based on the analysis results, it immediately compiles information on the relevant students and sends a notification to the user, who is an educator (teacher or counselor).
[0659] Step 6:
[0660] Based on notifications received from the server, users conduct interviews with the relevant students to confirm their statements and take appropriate action. The information obtained and the status of the response are later sent back to the server as feedback and used to improve the accuracy of the learning model.
[0661] (Example 1)
[0662] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0663] In educational settings, there is a challenge in detecting mental stress and psychological burden in students at an early stage. This increases the likelihood that educators may miss opportunities to intervene at the appropriate time, leading to students' mental health being overlooked. In particular, there is a need to continuously observe subtle changes in behavior and learning activities and respond quickly and accurately.
[0664] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0665] In this invention, the server includes a device for collecting user behavioral information and learning information, a device for extracting features from the collected information and inferring the user's emotional state and psychological stress level, and a device for analyzing the data using an artificial intelligence model generated using the extracted features. This makes it possible to detect the user's mental stress early and immediately send notifications to educators.
[0666] "Educational facilities" refer to places where users engage in learning activities, such as schools and cram schools.
[0667] "Users" refers to learners such as children and students who use educational facilities.
[0668] "Behavioral information" refers to data that shows the user's travel routes, physical movements, and activity patterns.
[0669] "Learning information" refers to data related to users' writing and learning activities, specifically handwritten records and essay content.
[0670] "Characteristics" refer to information extracted from collected behavioral and learning data to infer the user's emotional state and psychological stress level.
[0671] A "generated artificial intelligence model" refers to a model that uses machine learning algorithms to analyze data and evaluate the mental state of users.
[0672] An "educational staff member" refers to an individual who is responsible for guiding and supporting the learning activities of users within an educational facility, specifically a teacher or counselor.
[0673] "Notifications" refer to alerts and informational messages generated based on anomaly detection and sent to training personnel.
[0674] This invention is a system for detecting mental stress in users at educational facilities at an early stage and notifying educators. A specific embodiment of this system is shown below.
[0675] The terminal is a device that uses cameras and sensors installed in classrooms and educational facilities to monitor users' movements and activities in real time. The terminal collects behavioral information through these devices, and also collects learning information from digital notebooks used during lessons and from paper-based learning information obtained through scans.
[0676] The server uses advanced analytical techniques to process the collected information. Specifically, it uses OCR (Optical Character Recognition) technology to recognize handwritten characters as text data, then removes noise from behavioral information and standardizes the data. The server utilizes a generative AI model to extract user characteristics from the collected information and estimates psychological stress levels based on these characteristics. The estimated results are compared to a standard, and if an anomaly is detected, the educator is immediately notified.
[0677] The user, acting as the educator, receives notifications from the server, interacts with the user, and checks the situation in detail. This allows them to take appropriate action before the user's mental state deteriorates.
[0678] For example, if a particular user's handwritten records become messier than before at an educational facility, or if they start spending more time alone during breaks, the server will detect this as an anomaly and send a notification to the educator. Based on this information, the educator can then interview the user to confirm whether they are experiencing stress or have any problems.
[0679] Example of a prompt:
[0680] "Design a system that analyzes the possibility of mental stress in cases such as messy handwriting in class or an increase in the frequency of children spending time alone during recess, and notifies educators accordingly."
[0681] By using this system, educational institutions can protect the mental health of their users and support smooth educational activities.
[0682] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0683] Step 1:
[0684] The terminal uses cameras and sensors placed within classrooms and educational facilities to monitor users' movements and activities in real time. It takes camera footage and sensor data as input and outputs this as behavioral information. This operation includes analyzing camera footage and collecting movement data from sensors.
[0685] Step 2:
[0686] The device collects learning information from digital notebooks and paper documents obtained through scanning. It takes digital notebook data and scanned images as input and outputs them as learning information. Specific operations include importing digital notebook data and electronically recording paper documents using a scanner.
[0687] Step 3:
[0688] The server receives behavioral and learning information sent from the terminal. The input consists of behavioral and learning data sent from the terminal. The output is a single, consistent dataset composed of this data. Specifically, the server stores the data in a database and prepares for the next processing step.
[0689] Step 4:
[0690] The server uses OCR technology to convert handwritten characters from training data into text data. The input is scanned image data of handwritten characters, and the output is text data. The server applies an image processing algorithm to perform this text conversion.
[0691] Step 5:
[0692] The server performs noise reduction on behavioral data and standardizes the data. It takes real-time monitoring data as input and outputs clean, standardized behavioral data. Specifically, it applies a filtering algorithm to remove noise and converts the data to a standardized scale.
[0693] Step 6:
[0694] The server uses a generative AI model to extract features and evaluate the user's emotional state and mental stress. Inputs include standardized behavioral data and text data, and output is an estimated result of the emotional state and stress level. The server runs the AI model to perform pattern analysis and feature extraction.
[0695] Step 7:
[0696] The server sends a notification to the training supervisor if an anomaly is detected. The input is a prediction result, and the output is a warning notification delivered to the training supervisor. Specific actions include sending emails or messages using the notification system.
[0697] Step 8:
[0698] The user, acting as the educator, receives notifications and confirms details by conducting interviews with the user. Input is notifications from the server. Output is confirmation information regarding the user's status. Specific actions include setting up conversations and conducting interviews.
[0699] (Application Example 1)
[0700] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0701] It is necessary to provide an environment that enables rapid response by continuously monitoring the mental state of the elderly and detecting stress and psychological changes early. However, in current care settings, it is difficult to detect mental stress early, as caregivers and support staff simply wait for awareness. This poses a risk to the psychological health of the elderly.
[0702] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0703] In this invention, the server includes means for collecting individual care recipient activity data and daily life information, means for extracting behavioral characteristics from the collected data and estimating the care recipient's mental state and psychological stress level, and means for detecting abnormalities based on the estimation results and sending notifications to caregivers. This makes it possible to detect the care recipient's mental stress early and take prompt countermeasures.
[0704] "Activity data" refers to information about the movements and behavioral patterns of the person receiving care in their daily life.
[0705] "Daily life information" refers to information including the manual tasks that the person receiving care performs on a daily basis and related records.
[0706] "Mental state" is a concept that describes the emotional and psychological health of the person receiving care.
[0707] "Psychological stress level" is an indicator used to assess the degree of mental burden on a person receiving care.
[0708] "Means of collection" refers to methods and devices used to acquire activity data and information about daily life.
[0709] "Means of extracting features" refer to methods and techniques for analyzing significant patterns and trends from collected data.
[0710] "Methods of prediction" refer to techniques for predicting the mental state and psychological stress levels of care recipients based on data.
[0711] "Means for detecting anomalies" are systems for identifying behaviors or patterns that differ from normal conditions.
[0712] "Means of sending notifications" refers to communication methods used to inform support personnel of detected anomalies.
[0713] The system implementing the present invention provides a method for early detection of mental stress in the daily life of a person receiving care and enabling a rapid response.
[0714] The system primarily consists of a server, terminals, and users. The server collects activity data and daily life information of the person receiving care, and extracts behavioral characteristics based on this data. Activity data is acquired through the camera and sensors of smart glasses worn by the person receiving care. The collected data is transmitted wirelessly to a cloud server (such as Amazon Web Services or Microsoft Azure).
[0715] The server uses OCR technology to analyze digitized handwritten characters and employs generative AI models (such as GPT-4 and BERT) to estimate the care recipient's mental state and psychological stress level. During this process, the data undergoes pre-processing such as noise reduction and standardization. If the estimation reveals any abnormal patterns, the server quickly notifies the user. This notification is sent to the mobile devices of caregivers, family members, and other support personnel to prompt appropriate action.
[0716] As a concrete example, consider a case where an elderly person has recently been enjoying knitting less frequently. In this case, the system can compare hand movement data with daily habits to detect signs of mental stress and quickly notify the caregiver.
[0717] An example of a prompt when using a generative AI model is: "Analyze in detail how the behavioral patterns of a 65-year-old senior have changed over the past week, and infer signs of mental stress or poor health. In particular, consider the frequency and quality of manual tasks."
[0718] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0719] Step 1:
[0720] The device collects activity data from the care recipient's daily life through smart glasses worn by the care recipient. Inputs include image and audio data within the care recipient's field of vision, acquired in real time by sensors. The raw data is then transmitted wirelessly to a cloud server as output.
[0721] Step 2:
[0722] The server preprocesses the received activity data. It receives raw data sent from the terminal as input and performs noise reduction and data standardization. This improves data integrity and outputs clean data converted into a format suitable for analysis.
[0723] Step 3:
[0724] The server extracts features from clean data. Specifically, it analyzes the behavioral patterns and manual movements of the care recipients, and processes data to extract statistically significant features from the input data. As output, it generates a dataset summarizing the behavioral features.
[0725] Step 4:
[0726] The server uses a generative AI model to infer the mental state and psychological stress levels of the person being cared for from the extracted features. In this step, the model receives feature data as input and analyzes it. As output, it generates inferred results of the mental state and stress levels.
[0727] Step 5:
[0728] The server detects anomalies from the prediction results. It determines whether there are any abnormal patterns by comparing them with baseline values and past data. The prediction results are used as input, and if an anomaly is detected, it outputs anomaly information.
[0729] Step 6:
[0730] The server notifies the user of any detected anomalies. It sends alerts to the user (caregiver or family member) on a smartphone or other device. The input is anomaly information, and the output is a detailed alert message.
[0731] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0732] This invention provides a system for comprehensively evaluating the mental state of students in a school environment. This system combines behavioral and learning data collected from students with an emotion recognition engine for further accuracy. This allows for a more accurate estimation of students' mental stress and enables educators to provide appropriate interventions.
[0733] Data collection
[0734] The devices use cameras and microphones installed in classrooms and on campus to observe students' movement patterns and acquire audio and visual information in real time. This method comprehensively collects behavioral and emotional data of students. In parallel, learning data is also collected using digital notebooks and scanning devices.
[0735] Data preprocessing and feature extraction
[0736] The server analyzes the acquired audio and video data and uses speech recognition and facial detection technologies to recognize the user's emotions. This goes beyond simply analyzing behavioral data, extracting emotional characteristics that include contextual background. In addition, OCR technology is used to digitize handwritten text and quantify the content of the text and the writer's tendencies.
[0737] Analysis and Anomaly Detection
[0738] The server inputs this diverse data into a generating AI model, supplementing it with data obtained from the emotion engine to analyze the mental state of the students. By integrating real-time recognition results of emotion data with historical learning and behavioral data, it predicts stress levels and psychological changes with greater accuracy.
[0739] Notifications and feedback
[0740] Educators, as users, receive detailed reports and alerts from the server, enabling them to take prompt action regarding specific students. Results obtained through interviews and observations are fed back to the server, contributing to the improvement of future data analysis and anomaly detection algorithms.
[0741] Specific example
[0742] For example, even if student B appears focused during class, the emotional engine might detect signs of anxiety from changes in tone of voice or subtle facial expressions. If this, along with changes in behavioral and learning data, is deemed abnormal, the server immediately sends a notification to the educator. The teacher can then receive this information, schedule a meeting with student B, and provide support tailored to their individual situation. This system enables a holistic approach that considers not only digital information but also emotional aspects.
[0743] The following describes the processing flow.
[0744] Step 1:
[0745] The devices use cameras and microphones installed in classrooms and throughout the school to collect video and audio data of students in real time. They also utilize a dedicated digital scanning device to acquire learning data from digital notebooks.
[0746] Step 2:
[0747] The server analyzes the collected video data and uses face detection technology to read changes in facial expressions. Simultaneously, it uses speech recognition technology to extract features such as voice tone and speaking speed, and inputs this information into the emotion engine.
[0748] Step 3:
[0749] The server uses OCR to analyze digitized handwritten text, converting the content of essays and notes into text data. This allows for the quantification of linguistic features and handwriting consistency, and helps to understand learning trends.
[0750] Step 4:
[0751] The server integrates emotional data recognized by the emotion engine with behavioral and training data, and analyzes the emotional state and mental stress levels of students based on a model that combines the features of each.
[0752] Step 5:
[0753] Based on the analysis results, the server generates a detailed report and sends an alert to the user, an educator (e.g., a teacher or counselor), if it detects abnormal stress levels or psychological states.
[0754] Step 6:
[0755] Based on the provided reports, users conduct interviews with the relevant students and provide counseling and support as needed. The interview results are then fed back to the server and used to improve the accuracy of the system.
[0756] (Example 2)
[0757] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0758] Traditional systems for assessing the mental health of students in school settings are limited to the analysis of behavioral and learning data, lacking detailed insights into emotional states and the ability to provide appropriate individual support. This makes it difficult for educators to accurately understand students' stress levels, limiting opportunities for appropriate intervention.
[0759] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0760] In this invention, the server includes means for collecting behavioral information and learning outcomes of students in a school environment, means for analyzing emotions and facial expressions from the collected audio and video information, and means for digitizing handwritten character information using OCR technology and quantifying trends. This makes it possible to accurately estimate the mental stress and emotional state of students and to provide real-time notifications to educators.
[0761] "Behavioral information" refers to data on the actions, movements, and interactions that students exhibit within the school environment.
[0762] "Learning outcomes" refer to data that shows the knowledge, skills, and level of achievement of students through school education.
[0763] "Audio information" refers to data that includes acoustic characteristics such as the tone, content, and volume of a student's voice.
[0764] "Video information" refers to data that records the actions, facial expressions, and visual characteristics of children and students.
[0765] "Emotion" or "emotional state" refers to the psychological state of a child or student as inferred from their voice and facial expressions.
[0766] "Facial expression" refers to the visual expression of emotions derived from the movement and structure of the facial muscles of children and students.
[0767] "OCR technology" refers to the technology that converts handwritten or printed characters into digital text.
[0768] A "generative AI model" refers to an artificial intelligence model that performs inference and prediction based on diverse data.
[0769] "Sending notifications to educators" means informing educators in real time about any abnormalities or situations requiring attention regarding students.
[0770] The present invention provides a multifaceted evaluation of the mental state and stress levels of students in a school environment. This system collects behavioral information and learning outcomes in real time and analyzes emotional states using a generative AI model, thereby enabling appropriate intervention.
[0771] The devices use cameras and microphones installed in classrooms and on campus to acquire audio and video information of students. This allows for the real-time collection of behavioral information such as students' movements, facial expressions, and tone of voice. In addition, digital notebooks and scanning devices are used to save handwritten learning content as digital data, thereby capturing learning outcomes.
[0772] The server receives data transmitted from the terminal, converts the speech to text using speech recognition technology, and analyzes the characteristics of the voice. Furthermore, it analyzes facial expressions from video data using a face detection algorithm and quantifies the characteristics of emotions. It also digitizes handwritten information using OCR technology and quantifies the content and tendencies of the writing.
[0773] This data is fed into a generative AI model to infer emotional states and mental stress levels. The model integrates real-time and historical data for highly accurate analysis. If an anomaly is detected, the server sends a notification to the educator to prompt immediate intervention.
[0774] A concrete example is when a student appears calm during class, but the emotional engine detects anxiety through changes in tone of voice or subtle facial expressions. In such cases, the system immediately notifies the educator and suggests additional meetings or observations. This allows the educator to provide support tailored to the student's situation.
[0775] An example of a prompt is: "Based on Person A's recent behavior and emotional data, analyze whether there are any psychological changes or stressors, and infer the reasons why." This example serves as a guideline for generative AI models when making inferences based on diverse data.
[0776] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0777] Step 1:
[0778] The devices use cameras and microphones installed in classrooms and on campus to acquire audio and video information of students in real time. Specifically, they record facial expressions and movements through the cameras and collect voice tone and spoken content with the microphones. This input information is sent to a server as behavioral data and audio data.
[0779] Step 2:
[0780] The server processes the audio data received from the terminal through a speech recognition system. Here, the audio is converted to text, and characteristics such as tone and speed are analyzed. The input is audio data, and the output is the transcribed audio information and speech analysis data. This prepares the system for evaluating the emotions of the students.
[0781] Step 3:
[0782] The server analyzes video data by passing it through a face detection algorithm. This detects changes in the facial expressions of students and quantifies their emotional characteristics. The input is video data, and the output is quantified facial expression data. This data is useful for gaining a detailed understanding of emotional states.
[0783] Step 4:
[0784] The server uses OCR technology to convert handwritten text into digital text. The input is scanned handwritten learning output, and the output is the digitized text and its analysis results. This allows for evaluation of learning progress and trends.
[0785] Step 5:
[0786] The server inputs this data into a generative AI model. This model integrates speech analysis data, quantified facial expression data, and learning outcome data to estimate emotional state and mental stress levels. The output is the result of the mental state analysis.
[0787] Step 6:
[0788] The server sends a notification to the educator (user) if an anomaly is detected based on the analysis results. The input is the analysis results of the mental state, and the output is an alert message to the educator. This allows the educator to take necessary interventions.
[0789] Step 7:
[0790] Users receive notifications from the server and consider appropriate responses to students. Specifically, they take actions to improve students' situations, such as scheduling meetings. The input is alerts from the server, and the output is intervention actions taken by educators.
[0791] (Application Example 2)
[0792] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0793] Traditional school environments present a challenge in accurately assessing students' mental stress and emotional states. This can prevent timely intervention by educators, potentially negatively impacting students' psychological health. To address this issue, it is necessary to assess participants' emotional states in real time and provide the necessary support.
[0794] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0795] In this invention, the server includes a device for collecting behavioral and learning information of participants in an educational facility, a device for extracting features from the collected information and inferring the emotional state and mental load level of the participants, and an automated device for providing real-time feedback of the collected information as support to the participants. This makes it possible to quickly grasp the mental state of participants and provide necessary support in real time.
[0796] "Participants in educational facilities" refers to students who participate in activities within the school environment.
[0797] "Behavioral information" refers to information that includes data on participants' travel routes and their interactions with others.
[0798] "Learning information" refers to information that includes data on participants' handwritten text and the content of their written materials.
[0799] "Device" refers to hardware or software used to collect specific information, extract features, or make inferences.
[0800] "Mental stress level" is an evaluation index that indicates the degree of stress and emotional pressure experienced by participants.
[0801] An "automated device that provides real-time feedback" is a device that instantly analyzes collected information and provides situation-based support and alerts to participants and educators.
[0802] This invention is a system for evaluating the mental state of participants in educational facilities and providing necessary support. The system mainly consists of terminals installed within the educational facility, a central control server, and users who are educational providers.
[0803] The terminal uses cameras and microphones installed in designated locations within the facility to collect participant behavioral and learning information. Behavioral information includes participants' movement routes and interaction patterns with others, while learning information includes handwritten text and written content. This data is collected in real time, and all data is transmitted to a server.
[0804] The server analyzes the collected video and audio data using software such as OpenCV and Google Cloud Speech-to-Text. This analysis includes facial recognition and voice analysis, which are used to infer the participants' emotional states. Furthermore, generative AI models such as TensorFlow are used to assess the participants' mental stress levels. If any anomalies are detected, the resulting data is immediately communicated to the user (the educator).
[0805] Based on notifications sent from the server, users can provide appropriate intervention and support to individual participants. For example, if a participant shows signs of distress, the user can detect this and provide individual consultations or psychological support.
[0806] For example, if the server detects that a participant's facial expression is stiff during a lesson, the server will notify the educator of this data in real time. Based on this information, the educator will communicate with the participant and provide the necessary support.
[0807] An example of a prompt for a generative AI model is: "Evaluate changes in emotional state and estimate the level of mental stress based on the participant's behavioral and vocal information."
[0808] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0809] Step 1:
[0810] The terminal uses cameras within the facility to acquire video data of participants and microphones to collect audio data. The input for this step is real-time visual and auditory data obtained from participants, and the output is raw video and audio data to be sent to the server.
[0811] Step 2:
[0812] The server performs face recognition processing on the received video data using OpenCV. The input for this step is the video data sent from the terminal, and the output is the facial feature data of the participants. Specifically, the server detects faces in the video data and quantifies their features.
[0813] Step 3:
[0814] The server converts the audio data into text using the Google Cloud Speech-to-Text API. The input for this step is the collected audio data, and the output is the text data converted from the audio. Specifically, the server sends the audio data to the cloud service and receives it back as text.
[0815] Step 4:
[0816] The server integrates facial feature data and text data and uses a generative AI model to infer the emotional state of the participants. The input for this step is facial feature data and text data converted from speech, and the output is evaluation data based on emotion recognition. The server feeds this data into the AI model to evaluate the participants' emotions and mental stress levels.
[0817] Step 5:
[0818] The server detects anomalies based on emotion recognition results and sends notifications to the user as needed. The input for this step is evaluation data, and the output is alert information for the educator. Specifically, the server searches for anomalies and sends a notification to the user's device if one is found.
[0819] Step 6:
[0820] The user provides support tailored to the participant based on the alert information received. The input for this step is the alert information sent from the server, and the output is the actual intervention activity for the participant. Specifically, the user takes actions such as interacting with the participant or providing counseling.
[0821] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0822] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0823] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0824] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0825] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0826] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0827] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0828] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0829] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0830] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0831] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0832] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0833] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0834] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0835] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0836] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0837] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0838] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0839] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0840] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0841] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0842] The following is further disclosed regarding the embodiments described above.
[0843] (Claim 1)
[0844] In the school environment, means of collecting behavioral and learning data of students,
[0845] A method for extracting features from collected data and inferring the emotional state and mental stress levels of students,
[0846] A means of detecting anomalies based on prediction results and sending notifications to educators,
[0847] A system that includes this.
[0848] (Claim 2)
[0849] The system according to claim 1, wherein the behavioral data includes the movement trajectory of the student and the patterns of interaction with others.
[0850] (Claim 3)
[0851] The system according to claim 1, wherein the learning data includes handwritten information and essay content of students.
[0852] "Example 1"
[0853] (Claim 1)
[0854] In an educational facility, a device for collecting user behavior information and learning information,
[0855] A device that extracts features from collected information and estimates the user's emotional state and psychological stress level,
[0856] A device that detects anomalies based on prediction results and sends notifications to the training staff,
[0857] A device that analyzes data using an artificial intelligence model generated with extracted features,
[0858] A device that incorporates user feedback into the system's learning process to improve analysis accuracy,
[0859] A system that includes this.
[0860] (Claim 2)
[0861] The system according to claim 1, wherein the behavioral information includes the user's travel route and interaction patterns with others.
[0862] (Claim 3)
[0863] The system according to claim 1, wherein the learning information includes the user's handwritten records and the content of their compositions.
[0864] "Application Example 1"
[0865] (Claim 1)
[0866] A means of collecting individual care recipient activity data and daily life information,
[0867] A method for extracting behavioral characteristics from collected data and inferring the mental state and psychological stress levels of those receiving care,
[0868] A means of detecting anomalies based on the prediction results and sending notifications to supporters,
[0869] A system that includes this.
[0870] (Claim 2)
[0871] The system according to claim 1, wherein the behavioral data includes the care recipient's movement patterns in their living space and the frequency of conversations with others.
[0872] (Claim 3)
[0873] The system according to claim 1, wherein the daily life information includes the actions and work records of the person receiving care.
[0874] "Example 2 of combining an emotion engine"
[0875] (Claim 1)
[0876] In the school environment, means of collecting information on student behavior and learning outcomes,
[0877] A means for analyzing emotions and facial expressions from collected audio and video information,
[0878] A method for digitizing handwritten text information using OCR technology and quantifying trends,
[0879] A method for inferring emotional states and mental stress levels using generative AI models,
[0880] A means of detecting anomalies based on prediction results and sending notifications to educators,
[0881] A system that includes this.
[0882] (Claim 2)
[0883] The system according to claim 1, wherein the behavioral information includes the movement trajectory of the student and the patterns of interaction with others.
[0884] (Claim 3)
[0885] The system according to claim 1, wherein the learning outcomes include handwritten information and essay content of students.
[0886] "Application example 2 when combining with an emotional engine"
[0887] (Claim 1)
[0888] A device for collecting behavioral and learning information of participants in educational facilities within a school environment,
[0889] A device that extracts features from collected information and estimates the emotional state and mental stress levels of participants,
[0890] A device that detects anomalies based on prediction results and sends a warning to the education provider,
[0891] An automated device that provides real-time feedback of collected information to support participants,
[0892] A system that includes this.
[0893] (Claim 2)
[0894] The system according to claim 1, wherein the behavioral information includes the participant's travel route and the manner in which they interact with others.
[0895] (Claim 3)
[0896] The system according to claim 1, wherein the learning information includes the participant's handwritten text and written content. [Explanation of symbols]
[0897] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A means of collecting individual care recipient activity data and daily life information, A method for extracting behavioral characteristics from collected data and inferring the mental state and psychological stress levels of those receiving care, A means of detecting anomalies based on the prediction results and sending notifications to supporters, A system that includes this.
2. The system according to claim 1, wherein the behavioral data includes the care recipient's movement patterns in their living space and the frequency of conversations with others.
3. The system according to claim 1, wherein the daily life information includes the actions and work records of the person receiving care.