system

The system addresses the challenge of detecting emotional changes and stress in the workplace by preprocessing text and voice data with machine learning, allowing for early intervention and improved workplace mental health through real-time alerts.

JP2026101404APending Publication Date: 2026-06-22SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-10
Publication Date
2026-06-22

Smart Images

  • Figure 2026101404000001_ABST
    Figure 2026101404000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] Means for receiving message data and record data collected from users, A means for preprocessing the message data and record data and converting them into a unified format, A means for analyzing the aforementioned standardized data and identifying and scoring emotions, A means to track identified emotions over time and detect abnormal fluctuations, A means of sending a warning to the administrator via a communication means when abnormal emotional fluctuations are detected, A means of visualizing users' mental health information on a dashboard, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In the workplace, it is an object to solve the problem that it is difficult to quickly detect changes in emotions through communication among team members and to early discover signs of stress and burnout syndrome. Conventional problems of human relations and mental health are often addressed after they surface, and it is difficult to prevent the problems from worsening.

Means for Solving the Problems

[0005] This invention solves the above problem by providing a system that receives text and voice data collected from users and identifies emotions based on this data. Specifically, it preprocesses the text and voice data and analyzes it using a machine learning model to score emotions. Furthermore, it includes means for tracking the identified emotions over time and sending alerts to administrators when abnormal patterns are detected. This creates a system that can detect mental health problems at an early stage.

[0006] A "user" refers to an individual or organization that uses the system and generates communication data.

[0007] "Text data" refers to digital data that includes text information, such as emails and chat messages.

[0008] "Audio data" refers to digital data that includes sound, such as recordings of meetings or phone calls.

[0009] "Preprocessing" refers to the process of cleaning and formatting raw data to convert it into an analyzable format.

[0010] A "unified format" refers to a data format that has been standardized for analysis purposes.

[0011] "Analysis" refers to the process of analyzing data using machine learning and natural language processing to extract necessary information.

[0012] "Emotions" refer to the psychological state or feelings inferred from the user's communication.

[0013] An "abnormal pattern" refers to emotional changes or the frequent occurrence of specific feelings that go beyond the normal range.

[0014] A "machine learning model" refers to an algorithm or model that has been trained to perform a specific task based on data.

[0015] "Alert" refers to the function of a system that notifies the administrator when an abnormality is detected.

[0016] "Administrator" refers to an individual or organization that monitors the system, receives alerts, and takes corresponding actions.

Brief Description of Drawings

[0017] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which multiple emotions are mapped. [Figure 10] It shows an emotion map to which multiple emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13]It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when the emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when the emotion engine is combined.

Mode for Carrying Out the Invention

[0018] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0019] First, the terms used in the following description will be explained.

[0020] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0021] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0022] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0023] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0024] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0025] [First Embodiment]

[0026] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0027] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0028] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0029] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0030] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0031] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0032] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0033] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0034] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0035] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0036] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0037] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0038] This invention provides a system that analyzes users' emotions based on communication among team members and detects early signs of stress and burnout. This allows managers to take appropriate measures and improve the workplace environment.

[0039] Overview of the Embodiment

[0040] First, the device automatically collects the user's email, chat, and meeting recording data. The collected data is sent to the server via encrypted communication. After receiving this data, the server begins pre-processing the text and audio data. The pre-processed data is then used for analysis to identify sentiment.

[0041] The server performs sentiment analysis using machine learning models. This scores and classifies the user's emotions from text and audio data. For example, categories such as "stress," "dissatisfaction," and "joy" are assigned. The analysis results are stored in a database and managed as time-series data for each user.

[0042] Furthermore, the server tracks emotional changes in real time. If an abnormal pattern is detected, it sends an alert to the administrator based on a pre-configured threshold. This alert is delivered via dashboard, email, and notification system, allowing administrators to quickly formulate a response plan.

[0043] Specific example

[0044] As a concrete example, suppose a member of a project team starts using phrases like "I'm tired" or "I've lost motivation" frequently in chat messages after working for long hours. This information is collected on the terminal and processed and analyzed on the server. If the analysis determines that stress levels are high, the server sends an alert to HR personnel or the project leader. This allows for early follow-up conversations and the implementation of appropriate support measures.

[0045] Thus, the present invention provides an embodiment of a system that enables efficient detection of changes in a user's emotions and allows appropriate countermeasures to be taken before problems become apparent.

[0046] The following describes the processing flow.

[0047] Step 1:

[0048] The device automatically collects user email, chat, and meeting recording data through its interface. This can be done using a dedicated application or browser extension. The collected data is encrypted and sent to the server using a secure protocol.

[0049] Step 2:

[0050] The server receives the incoming data and temporarily stores it in the database. Then, it performs data preprocessing. Specifically, in the case of audio data, it is converted to text using speech recognition software, and all data formats are standardized. Noise reduction and spell checking are also performed at this stage.

[0051] Step 3:

[0052] The server performs sentiment analysis on pre-processed text data. Machine learning models and natural language processing algorithms are used to identify and score sentiments from words and phrases. Sentiment categories include positive, neutral, and negative, and the results are stored in a database.

[0053] Step 4:

[0054] The server uses the results of sentiment analysis to monitor user emotional fluctuations over time. Statistical methods and anomaly detection are employed to detect abnormal changes and specific patterns. This detection is performed in real time, providing foundational data for rapid response.

[0055] Step 5:

[0056] The server generates an alert based on a pre-configured threshold when an abnormal emotional pattern is detected. This alert is sent to HR personnel and managers via email or push notification. Receiving the alert allows administrators to quickly understand the problem and take appropriate action.

[0057] Step 6:

[0058] Users (administrators and HR personnel) can view detailed sentiment analysis on a dashboard based on received alerts. They can then contact problematic team members early on and provide stress care support and resources. This approach allows for resolution before problems become entrenched.

[0059] (Example 1)

[0060] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0061] It is difficult to detect early signs of emotional changes or stress in employees in the workplace and to take appropriate action quickly. To improve productivity while maintaining employees' mental health, it is necessary to monitor their emotional state efficiently and accurately.

[0062] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0063] In this invention, the server includes means for receiving information acquired from a user, means for preprocessing the information and converting it into a standardized format, and means for interpreting the information in the standardized format and detecting emotions. This makes it possible to quickly grasp changes in employees' emotions in the workplace and take appropriate action, thereby improving productivity while maintaining mental health.

[0064] A "user" refers to anyone or an organization that provides information to the system.

[0065] "Information" refers to data collected from users, including text data and audio data.

[0066] "Preprocessing" refers to a series of processes that convert collected information into a format that is easy to analyze.

[0067] "Standardized format" refers to a state in which pre-processed information has been converted into a common format that can be analyzed.

[0068] "Interpretation" refers to the process of detecting emotions based on standardized information.

[0069] "Emotions" refer to internal experiences and situations that represent the user's psychological state.

[0070] An "abnormal situation" refers to an unusual pattern or tendency of emotions, and is an event that requires attention from the manager.

[0071] An "administrator" is a person within an organization who is responsible for receiving notifications sent by the system.

[0072] "Notification" refers to a means of communication sent to an administrator when an abnormal situation occurs.

[0073] "Electronic communication" refers to means of transmitting information via digital media, including email and instant messaging.

[0074] "Notification services" refer to additional means of providing information about abnormal situations, complementing electronic communications.

[0075] This invention provides a system that utilizes communication data between users to analyze emotions in real time. Specific embodiments are described below.

[0076] The terminal will collect user communication data, specifically text and voice data, locally. It is expected that dedicated applications and monitoring software for data collection will be installed on this terminal. Users will not need to perform any special operations while continuing their normal work.

[0077] The collected data is securely transmitted to the server using common encryption technologies such as AES and HTTPS. Upon arrival at the server, preprocessing is performed immediately. In this preprocessing, natural language processing tools (e.g., NLTK and spaCy) clean the text data, and speech data is converted to text using a speech recognition API (e.g., Google® Speech-to-Text).

[0078] The server uses pre-processed data to request sentiment analysis from a generative AI model. Models such as BERT and GPT are expected to be applied. An example of a prompt is "Classify the sentiment of this text." This results in the model scoring and classifying emotions such as "stress," "dissatisfaction," and "joy."

[0079] The analysis results are stored as time-series data in a database on the server, and administrators can monitor this in real time through dashboards and other means. If an abnormal emotional pattern is detected, an immediate alert is sent to the administrator via email or a dedicated notification service.

[0080] For example, if a member of a project team frequently uses phrases like "tired" in chat messages after working for long hours, this could be interpreted as a sign of stress, triggering an emotional analysis that sends an alert to the HR department. Based on this information, managers can then take necessary follow-up steps and implement appropriate support measures.

[0081] This invention enables companies and organizations to efficiently understand the emotional state of their employees and contribute to maintaining a healthy work environment.

[0082] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0083] Step 1:

[0084] The device collects text and audio data from the user's daily communication activities. Inputs include emails, chat history, and meeting recordings, and the output is encrypted data. This collection process is automated by a dedicated application installed on the device.

[0085] Step 2:

[0086] The terminal encrypts the collected data using AES and securely transmits it to the server using the HTTPS protocol. The input is the unencrypted data obtained in step 1, and the output is the securely encrypted and transmitted data. This operation ensures the confidentiality of the data.

[0087] Step 3:

[0088] The server performs preprocessing on the received data. The input is encrypted data, and the raw data obtained by decrypting it is the result. The server uses natural language processing tools (NLTK and spaCy) to clean the text of this data and converts the audio data to text using the Google Speech-to-Text API. The output is standardized data that can be parsed.

[0089] Step 4:

[0090] The server inputs pre-processed data into a generating AI model and performs sentiment analysis. The input is standardized data, and the output is a sentiment score as the analysis result. During this process, the model is instructed using prompts such as "Classify the sentiment of this text."

[0091] Step 5:

[0092] The server stores the results of the sentiment analysis in a database. The input is the raw data in sentiment score format, and the output is time-series data for each user in the database. This establishes a system that allows for continuous tracking of changes in the user's sentiment.

[0093] Step 6:

[0094] The server analyzes accumulated time-series data and immediately sends an alert to the administrator if an abnormal sentiment pattern occurs. The input is time-series data, and the output is notification information based on anomaly detection. Notifications are provided to the administrator via email and a dashboard.

[0095] (Application Example 1)

[0096] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0097] Mental health issues are becoming increasingly serious in modern workplaces, and especially in environments with long working hours and high stress levels, it is crucial to detect emotional fluctuations early and take appropriate measures. However, existing systems have made it difficult to quickly analyze individual feedback and provide necessary information to managers, sometimes resulting in delays in improving the workplace environment. Therefore, there is a need for a more effective way to analyze emotional fluctuations and quickly improve the workplace environment.

[0098] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0099] In this invention, the server includes means for receiving message data and recorded data collected from users, means for preprocessing the message data and recorded data and converting it into a unified format, and means for analyzing the data in the unified format and identifying and scoring emotions. This enables effective analysis of fluctuations in the emotions of individual users in the workplace, and when abnormal fluctuations are detected, it quickly notifies administrators of the warning, making it possible to improve the workplace environment.

[0100] "Message data collected from users" refers to the general term for text or audio information obtained from communication methods that users use on a daily basis.

[0101] "Recorded data" refers to recordings of information, such as audio, text, or a combination thereof, generated or acquired by a computer system.

[0102] "Preprocessing" refers to a series of data manipulation operations performed to convert collected data into an analyzable format.

[0103] A "unified format" refers to a state in which data of different formats and content has been converted into a form that can be analyzed uniformly.

[0104] "Analyzing data to identify and score emotions" is the act of identifying emotions based on the content of the data and quantifying their intensity and characteristics.

[0105] "Emotional fluctuations" refer to how emotional states change over time.

[0106] "Abnormal fluctuations" refer to emotional changes that exhibit unusual patterns or frequencies.

[0107] "Sending a warning to the administrator via communication means" refers to alerting administrators via email or a notification system when an anomaly is detected.

[0108] "Visualizing users' mental health information on a dashboard" refers to a screen display that makes it easier to visually show individual users' emotional states and how they change.

[0109] To implement this invention, the server receives message data and recorded data transmitted by the user. This data is obtained from the user's everyday means of communication, such as email or voice chat. After receiving the data, the server uses dedicated preprocessing software to convert the data into a unified format that can be analyzed. Preprocessing includes tokenization and speech-to-text conversion.

[0110] Next, the server uses a generative AI model to identify and score emotions from the pre-processed data. This process quantifies and classifies the emotional information contained in the data, making it possible to identify abnormal emotional fluctuations. If the server detects abnormal emotional fluctuations, it immediately sends a warning to the administrator via communication channels. These communication channels include email and notifications on the dashboard. Furthermore, the analyzed user mental health information is visualized on the dashboard in a format that is easily understandable to the administrator.

[0111] The hardware used for implementation includes cloud servers for preprocessing and analysis. Machine learning frameworks such as TENSORFLOW® and PyTorch are used to run the generative AI model. MySQL® or PostgreSQL are used as the database for storing the analysis results.

[0112] As a concrete example, suppose an employee starts frequently using the phrase "I feel pressured" due to project stress. This information is analyzed and notified by the server, allowing administrators to intervene quickly and provide support to alleviate the employee's stress.

[0113] Examples of specific prompts for a generative AI model are as follows:

[0114] "Please analyze this email data to detect signs of stress and anxiety."

[0115] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0116] Step 1:

[0117] The terminal collects message and log data from the user's everyday communication methods. In this case, input is in the form of email or voice chat, and output is the collected raw data. The terminal encrypts this data and prepares it for secure transmission to the server.

[0118] Step 2:

[0119] The server receives encrypted data sent from the terminal and decrypts it. The input is encrypted raw data, and the output is the decrypted raw message and recorded data. The server performs preprocessing, such as tokenization and speech-to-text conversion, to convert the data into a unified, analyzable format. This process prepares the data for analysis.

[0120] Step 3:

[0121] The server uses a generative AI model to identify and score emotions based on pre-processed data. The output is the score of emotions contained in the text and audio. The server then quantifies emotion categories such as "joy," "dissatisfaction," and "stress," and determines the intensity of those emotions.

[0122] Step 4:

[0123] The server analyzes emotional fluctuations over time based on the scoring results and detects abnormal patterns. The input is data that has been scored again for emotional fluctuations, and the output is warning flags and information indicating abnormal fluctuations. When an anomaly is detected, the server prepares to take appropriate action.

[0124] Step 5:

[0125] If the server detects an abnormal emotional fluctuation, it sends a warning to the administrator via communication channels. The input is an anomaly detection alert information, and the output is a warning message directed to the administrator. Through this process, the server communicates the situation to the administrator in real time.

[0126] Step 6:

[0127] The server visualizes the analyzed user's mental health information on a dashboard, allowing administrators to easily assess the situation. Input is sentiment analysis and scored data, while output is visually displayed mental health information. The system organizes and presents information in a way that is easy for administrators to understand and quickly grasp the situation.

[0128] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0129] This invention provides a system that analyzes emotions in real time using text and voice data collected from users. This system incorporates an emotion engine for emotion recognition and supports the management of the user's mental health.

[0130] Overview of the Embodiment

[0131] First, the device collects audio data from emails, chats, and recorded meetings. The data is sent to the server via a secure protocol. The server stores the received data and performs preprocessing. Preprocessing includes converting the audio data to text, denoising, and standardizing the data format.

[0132] In the sentiment analysis process, the server uses a sentiment engine, leveraging machine learning models and natural language processing techniques to extract emotional insights from the data. The sentiment engine can recognize new sentiment categories in addition to traditional ones. Furthermore, user profile information is taken into account to improve the accuracy of individual sentiment recognition.

[0133] The analysis results are stored in a database and used for tracking emotions over time and detecting anomaly patterns. The server also has the capability to predict future emotion trends by referring to past data history. If an anomaly is detected, the server generates an alert and sends a notification to HR personnel and managers. This notification is delivered via email or a dedicated notification system.

[0134] Specific example

[0135] For example, suppose an employee is experiencing high levels of stress due to an approaching project deadline. The terminal collects the employee's communication data and sends it to the server. The server's emotion engine analyzes this data and detects negative emotions such as stress and dissatisfaction at a high frequency. As a result, the emotion trend shows signs of worsening, and the server automatically sends an alert to the supervisor.

[0136] Supervisors receive notifications, check the dashboard, and view detailed analysis results. They can then take immediate action to resolve the problem. This strengthens the workplace environment and employee mental health care.

[0137] By implementing this invention, advanced emotional analysis using an emotion engine becomes possible, making early intervention in the workplace a reality.

[0138] The following describes the processing flow.

[0139] Step 1:

[0140] The device automatically collects the user's email, chat, and meeting audio data through a dedicated application. This data is encrypted and sent to the server using a secure protocol.

[0141] Step 2:

[0142] The server stores the received data in a database and performs preprocessing for analysis. Audio data is converted to text using speech recognition technology, and all data is formatted to a consistent style. Noise reduction and typographical error correction are also performed at this stage.

[0143] Step 3:

[0144] The server passes pre-processed text data to the sentiment engine. The sentiment engine uses machine learning models to identify and score the emotions in the data. In this process, data points are classified into multiple emotion categories.

[0145] Step 4:

[0146] The server tracks sentiment data scored by the sentiment engine over time. Anomaly detection algorithms are used to identify unusual sentiment patterns and sudden changes in sentiment to detect anomalies.

[0147] Step 5:

[0148] The server generates alerts based on detected anomaly patterns. These alerts consider individual user profile information and historical data to set tiered warning levels.

[0149] Step 6:

[0150] Alerts generated by the server are sent to administrators via email or a dedicated notification system. The alert notification includes a description of the anomaly along with recommended actions.

[0151] Step 7:

[0152] Users (administrators and HR personnel) can access the dashboard based on alerts to obtain detailed analysis results. This allows them to follow up with stakeholders before problems surface and determine appropriate mental health care measures.

[0153] (Example 2)

[0154] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0155] Conventional sentiment analysis systems have limited ability to accurately identify user emotions and quickly and automatically detect abnormal emotional patterns. Furthermore, analysis based on simple emotion categories struggles to capture the nuances of individual user emotions. In addition, the methods for notifying users when anomalies are detected are limited, posing a challenge in delivering timely and accurate alerts to relevant parties.

[0156] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0157] In this invention, the server includes means for receiving text data and audio data collected from users by an information processing device; means for performing speech recognition and noise reduction preprocessing on the text data and audio data and unifying the data format; and means for identifying emotions by applying machine learning models and natural language processing techniques to the unified data format. This enables the identification of diverse and precise emotion categories, rapid detection of abnormal emotion patterns, and immediate alert notifications to relevant parties.

[0158] An "information processing system" is a collective term for a set of hardware and software used to collect, store, process, and analyze data.

[0159] "User" refers to an individual or organization that provides data using this system.

[0160] "Text data" refers to document-format data generated or provided by the user, including email and chat content.

[0161] "Audio data" refers to data based on audio provided by the user, and includes recorded meeting audio and voice messages.

[0162] "Speech recognition" refers to the technology that analyzes audio data and converts it into text, making it possible to treat speech as textual information.

[0163] "Noise reduction" is the process of removing unwanted sounds and interfering background noise in audio data processing.

[0164] "Preprocessing" refers to a series of processes performed after data collection to prepare the data for analysis, and includes noise reduction and standardization of data formats.

[0165] "Unifying data formats" refers to the process of converting data in different formats into a common, parseable format.

[0166] A "machine learning model" is an algorithm or mathematical model used to learn from large amounts of data and automate specific tasks.

[0167] "Natural language processing technology" refers to a set of technologies that enable computers to understand, interpret, and generate human language.

[0168] "Identifying emotions" refers to the process of estimating and identifying a user's emotional state from text or audio data.

[0169] An "abnormal pattern" refers to a pattern that shows unnatural emotional changes or tendencies that differ from the user's normal emotional state.

[0170] "Electronic communications" refers to all technologies that send and receive information via digital devices, and includes email and dedicated messaging protocols.

[0171] This invention provides a system that analyzes a user's emotions in real time, detects emotional anomalies based on the results, and generates alerts. The invention can be implemented in the following forms.

[0172] The device collects text and audio data from the user. Common data collection devices such as email applications, chat software, and conference recording devices are used for this data collection. The collected data is securely transmitted to a server.

[0173] The server temporarily stores the received data. Audio data, in particular, is converted into text data using speech recognition software such as Google Cloud Speech-to-Text or Microsoft® Azure® Speech Service. This process applies noise filtering algorithms to improve data quality. The data is then standardized into a format suitable for analysis.

[0174] During the analysis process, the server runs an emotion engine. This engine uses machine learning models (e.g., BERT and GPT) and natural language processing techniques to identify emotions from the integrated data. The emotion engine also utilizes generative AI models to recognize diverse emotion categories. Based on user profiles, it is possible to individually improve the accuracy of emotion recognition.

[0175] The results of the sentiment analysis are stored in a dedicated data storage system, and abnormal patterns are detected based on this data. When an abnormality is detected, an alert is automatically sent electronically to administrators and relevant organizational personnel.

[0176] For example, if a user makes a statement like, "I'm worried because I'm not making progress on my exam preparations," during a project, this is recorded in the database, and the server scores it as "anxiety." If this occurs frequently, an anomaly is detected in the emotional data, and an alert is sent to management.

[0177] This invention enables the extraction of emotional insights, which can be used to improve the work environment and support mental health care. An example of a prompt message would be, "A method to analyze a user's emotions and detect anomalies in relation to the pressure of project deadlines."

[0178] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0179] Step 1:

[0180] The terminal collects text and audio data from the user. Specific examples include messages sent by the user in a chat application and audio recorded by a conference recording device. At this stage, the input is raw communication data generated by the user, and the output is a data file appropriately formatted for further processing of this data.

[0181] Step 2:

[0182] The terminal sends the collected data to the server via a secure protocol. The data transmission uses a protocol that encrypts the data for secure transfer (e.g., TLS). The input is the formatted data collected in step 1, and the output is the decrypted raw data received on the server side.

[0183] Step 3:

[0184] The server converts the received audio data into text data. This process utilizes speech recognition software to convert spoken language into text. It also executes a noise filtering algorithm to remove background noise and improve quality. The input is an audio file, and the output is text data with noise removed and converted to characters.

[0185] Step 4:

[0186] The server uses the converted text data to standardize the data format and prepare it for subsequent processing. JSON or XML formats are used for data format standardization. The input is the text data generated in step 3, and the output is standardized, consistent data.

[0187] Step 5:

[0188] The server runs an emotion engine and analyzes the data using machine learning models and natural language processing techniques. It uses a generative AI model to identify positive, negative, and other diverse emotion categories. The input is the integrated data prepared in step 4, and the output is the analyzed data with emotion categories assigned to it.

[0189] Step 6:

[0190] The server stores the analyzed emotion data in a database and performs time-series emotion tracking and anomaly pattern detection. Anomaly pattern detection uses an algorithm that identifies changes that deviate from normal emotional states. The input is the analyzed data generated in step 5, and the output is time-series emotion data and the results of anomaly detection.

[0191] Step 7:

[0192] The server generates an alert when an anomaly is detected and notifies administrators and responsible personnel of the alert via electronic communication. Email or a dedicated messaging system is used for notification. The input is the emotion anomaly data identified in the processing up to step 6, and the output is the status of the alert notification's transmission completion.

[0193] (Application Example 2)

[0194] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0195] There is a challenge in appropriately monitoring the emotional state of the elderly, detecting abnormal emotional patterns early, and responding promptly. In particular, while effectively managing the mental health of the elderly is important in nursing homes and home care settings, current systems make it difficult to grasp emotional changes in real time and provide necessary care.

[0196] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0197] In this invention, the server includes means for receiving data collected from users, means for preprocessing the data and converting it into a unified format, means for analyzing the data in the unified format and identifying emotions, means for tracking the identified emotions in chronological order and detecting abnormal patterns, means for sending a warning to an administrator when an abnormal emotional pattern is detected, means for monitoring the emotional trends of elderly people in real time, and means for prompting caregivers of elderly people to respond based on emotional trends. This makes it possible to grasp the emotional state of elderly people in real time and respond quickly to abnormalities.

[0198] "Means for receiving data" refers to an interface for receiving information such as text and audio provided by the user into the server.

[0199] "Methods for pre-processing and converting to a unified format" refers to processing methods that perform noise reduction, text conversion, and other modifications on received data to transform it into a consistent data format.

[0200] "Means of identifying emotions" refer to technical methods for analyzing pre-processed data and recognizing specific emotional states.

[0201] "Methods for tracking over time and detecting abnormal patterns" refers to a function that observes changes in emotions over time and identifies unusual emotional movements.

[0202] A "means of sending warnings" refers to a system for communicating messages to administrators or caregivers about detected abnormal patterns to draw their attention.

[0203] "Methods for monitoring the emotional trends of the elderly in real time" refers to a process for collecting and continuously observing the emotional state of the elderly in real time.

[0204] "Methods to encourage caregivers to respond based on emotional trends" refer to systems that instruct relevant parties to carry out care activities that meet the needs of elderly individuals in response to changes in their emotional state.

[0205] The system implementing this invention mainly consists of a server and a terminal. The terminal is responsible for continuously recording conversations and monologues of elderly people and sending the audio data to the server via a secure protocol (e.g., HTTPS). The server uses the Google Cloud Speech-to-Text API to convert the audio data into text and performs preprocessing such as noise reduction and data format standardization.

[0206] For server-side sentiment analysis, software such as Python and scikit-learn are used, employing machine learning techniques to identify emotions from text data. Specifically, emotions are scored, and their changes are tracked over time. The identified emotion data is stored in a database, allowing for observation of emotional trends as time-series data.

[0207] When an abnormal emotional pattern is detected, the server immediately sends a warning message to the administrator or caregiver. Communication is conducted via email or a dedicated notification application, enabling a rapid response. This allows for real-time monitoring of trends in the emotional state of elderly individuals and prompting necessary interventions.

[0208] For example, if an elderly person's frequency of talking to themselves at night increases, caregivers can consider appropriate responses based on data such as the possibility that their daytime activity level has decreased. In this way, the objective of this invention is to effectively manage the mental health of the elderly and improve their quality of life.

[0209] Examples of prompts for a generative AI model are as follows:

[0210] "Analyze daily conversational audio data from elderly individuals to determine their daily emotional trends. In particular, detect signs of stress and anxiety and create a report."

[0211] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0212] Step 1:

[0213] The device records the daily conversations and monologues of elderly people. The audio data is the input and is temporarily stored as a digital audio file before being sent to the server. This data is collected in batches at specific time intervals.

[0214] Step 2:

[0215] The terminal sends the collected audio data to the server using a secure protocol. The input is the audio data from the terminal, and the output is the stream-formatted audio data received by the server. HTTPS is used as the communication protocol to ensure secure data transfer.

[0216] Step 3:

[0217] The server converts the received audio data into text data using the Google Cloud Speech-to-Text API. The input is audio data, and the output is text data. This text conversion prepares the data for natural language processing.

[0218] Step 4:

[0219] The server performs preprocessing on the text data, including noise reduction and data format standardization. The input is text data, and the output is de-noised and standardized text data. Specific operations include the removal of special characters and grammatical normalization.

[0220] Step 5:

[0221] The server performs sentiment analysis using preprocessed text data. It runs machine learning models using Python and scikit-learn. Input is text data in a unified format, and output is sentiment scores. A generative AI model is used to classify the data into specific sentiment categories.

[0222] Step 6:

[0223] The server stores the identified sentiments as time-series data in a database. The input is the sentiment score, and the output is the record in the time-series database. This makes it possible to track and evaluate sentiment trends later.

[0224] Step 7:

[0225] The server analyzes time-series data from the database to detect abnormal emotional patterns. The input is time-series data from the database, and the output is the detection results of abnormal patterns. These abnormal patterns trigger the next process.

[0226] Step 8:

[0227] When an abnormal pattern is detected, the server sends a warning message to administrators or caregivers. The input is the abnormal pattern, and the output is a warning message via email or notification application. This enables a rapid response tailored to the condition of the elderly person.

[0228] An example of a prompt message might be: "Analyze the daily conversational audio data of elderly people and determine their daily emotional trends. In particular, detect signs of stress and anxiety and create a report."

[0229] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0230] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0231] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0232] [Second Embodiment]

[0233] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0234] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0235] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0236] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0237] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0238] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0239] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0240] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0241] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0242] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0243] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0244] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0245] This invention provides a system that analyzes users' emotions based on communication among team members and detects early signs of stress and burnout. This allows managers to take appropriate measures and improve the workplace environment.

[0246] Overview of the Embodiment

[0247] First, the device automatically collects the user's email, chat, and meeting recording data. The collected data is sent to the server via encrypted communication. After receiving this data, the server begins pre-processing the text and audio data. The pre-processed data is then used for analysis to identify sentiment.

[0248] The server performs sentiment analysis using machine learning models. This scores and classifies the user's emotions from text and audio data. For example, categories such as "stress," "dissatisfaction," and "joy" are assigned. The analysis results are stored in a database and managed as time-series data for each user.

[0249] Furthermore, the server tracks emotional changes in real time. If an abnormal pattern is detected, it sends an alert to the administrator based on a pre-configured threshold. This alert is delivered via dashboard, email, and notification system, allowing administrators to quickly formulate a response plan.

[0250] Specific example

[0251] As a concrete example, suppose a member of a project team starts using phrases like "I'm tired" or "I've lost motivation" frequently in chat messages after working for long hours. This information is collected on the terminal and processed and analyzed on the server. If the analysis determines that stress levels are high, the server sends an alert to HR personnel or the project leader. This allows for early follow-up conversations and the implementation of appropriate support measures.

[0252] Thus, the present invention provides an embodiment of a system that enables efficient detection of changes in a user's emotions and allows appropriate countermeasures to be taken before problems become apparent.

[0253] The following describes the processing flow.

[0254] Step 1:

[0255] The device automatically collects user email, chat, and meeting recording data through its interface. This can be done using a dedicated application or browser extension. The collected data is encrypted and sent to the server using a secure protocol.

[0256] Step 2:

[0257] The server receives the incoming data and temporarily stores it in the database. Then, it performs data preprocessing. Specifically, in the case of audio data, it is converted to text using speech recognition software, and all data formats are standardized. Noise reduction and spell checking are also performed at this stage.

[0258] Step 3:

[0259] The server performs sentiment analysis on pre-processed text data. Machine learning models and natural language processing algorithms are used to identify and score sentiments from words and phrases. Sentiment categories include positive, neutral, and negative, and the results are stored in a database.

[0260] Step 4:

[0261] The server uses the results of sentiment analysis to monitor user emotional fluctuations over time. Statistical methods and anomaly detection are employed to detect abnormal changes and specific patterns. This detection is performed in real time, providing foundational data for rapid response.

[0262] Step 5:

[0263] The server generates an alert based on a pre-configured threshold when an abnormal emotional pattern is detected. This alert is sent to HR personnel and managers via email or push notification. Receiving the alert allows administrators to quickly understand the problem and take appropriate action.

[0264] Step 6:

[0265] Users (administrators and HR personnel) can view detailed sentiment analysis on a dashboard based on received alerts. They can then contact problematic team members early on and provide stress care support and resources. This approach allows for resolution before problems become entrenched.

[0266] (Example 1)

[0267] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0268] It is difficult to detect early signs of emotional changes or stress in employees in the workplace and to take appropriate action quickly. To improve productivity while maintaining employees' mental health, it is necessary to monitor their emotional state efficiently and accurately.

[0269] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0270] In this invention, the server includes means for receiving information acquired from a user, means for preprocessing the information and converting it into a standardized format, and means for interpreting the information in the standardized format and detecting emotions. This makes it possible to quickly grasp changes in employees' emotions in the workplace and take appropriate action, thereby improving productivity while maintaining mental health.

[0271] A "user" refers to anyone or an organization that provides information to the system.

[0272] "Information" refers to data collected from users, including text data and audio data.

[0273] "Preprocessing" refers to a series of processes that convert collected information into a format that is easy to analyze.

[0274] "Standardized format" refers to a state in which pre-processed information has been converted into a common format that can be analyzed.

[0275] "Interpretation" refers to the process of detecting emotions based on standardized information.

[0276] "Emotions" refer to internal experiences and situations that represent the user's psychological state.

[0277] An "abnormal situation" refers to an unusual pattern or tendency of emotions, and is an event that requires attention from the manager.

[0278] An "administrator" is a person within an organization who is responsible for receiving notifications sent by the system.

[0279] "Notification" refers to a means of communication sent to an administrator when an abnormal situation occurs.

[0280] "Electronic communication" refers to means of transmitting information via digital media, including email and instant messaging.

[0281] "Notification services" refer to additional means of providing information about abnormal situations, complementing electronic communications.

[0282] This invention provides a system that utilizes communication data between users to analyze emotions in real time. Specific embodiments are described below.

[0283] The terminal will collect user communication data, specifically text and voice data, locally. It is expected that dedicated applications and monitoring software for data collection will be installed on this terminal. Users will not need to perform any special operations while continuing their normal work.

[0284] The collected data is securely sent to the server using common encryption technologies such as AES and HTTPS. When the data reaches the server, preprocessing is immediately performed. In this preprocessing, natural language processing tools (e.g., NLTK and spaCy) clean the text data, and audio data is converted to text using an audio recognition API (e.g., Google Speech-to-Text).

[0285] The server requests sentiment analysis from the generative AI model using the preprocessed data. Models such as BERT and GPT are assumed to be applied. As an example of the prompt text, an instruction such as "Please classify the sentiment of this text" is passed to the model. As a result, sentiments such as "stress", "dissatisfaction", and "joy" are scored and classified.

[0286] The analysis results are stored as time-series data in the database within the server, and the administrator can monitor this in real time through a dashboard or the like. When an abnormal sentiment pattern is detected, an immediate alert is sent to the administrator via email or a dedicated notification service.

[0287] As a specific example, if words such as "tired" frequently appear in the chat as a result of members of a project team working continuously for a long time, sentiment analysis is performed on this as a sign of stress, and an alert is sent to the personnel department. Based on this information, the administrator can perform the necessary follow-up and take appropriate support measures.

[0288] With this invention, companies and organizations can efficiently grasp the emotional state of employees and contribute to maintaining a healthy workplace environment.

[0289] The flow of the specific process in Example 1 will be described using FIG. 11.

[0290] Step 1:

[0291] The device collects text and audio data from the user's daily communication activities. Inputs include emails, chat history, and meeting recordings, and the output is encrypted data. This collection process is automated by a dedicated application installed on the device.

[0292] Step 2:

[0293] The terminal encrypts the collected data using AES and securely transmits it to the server using the HTTPS protocol. The input is the unencrypted data obtained in step 1, and the output is the securely encrypted and transmitted data. This operation ensures the confidentiality of the data.

[0294] Step 3:

[0295] The server performs preprocessing on the received data. The input is encrypted data, and the raw data obtained by decrypting it is the result. The server uses natural language processing tools (NLTK and spaCy) to clean the text of this data and converts the audio data to text using the Google Speech-to-Text API. The output is standardized data that can be parsed.

[0296] Step 4:

[0297] The server inputs pre-processed data into a generating AI model and performs sentiment analysis. The input is standardized data, and the output is a sentiment score as the analysis result. During this process, the model is instructed using prompts such as "Classify the sentiment of this text."

[0298] Step 5:

[0299] The server stores the results of the sentiment analysis in a database. The input is the raw data in sentiment score format, and the output is time-series data for each user in the database. This establishes a system that allows for continuous tracking of changes in the user's sentiment.

[0300] Step 6:

[0301] The server analyzes accumulated time-series data and immediately sends an alert to the administrator if an abnormal sentiment pattern occurs. The input is time-series data, and the output is notification information based on anomaly detection. Notifications are provided to the administrator via email and a dashboard.

[0302] (Application Example 1)

[0303] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0304] Mental health issues are becoming increasingly serious in modern workplaces, and especially in environments with long working hours and high stress levels, it is crucial to detect emotional fluctuations early and take appropriate measures. However, existing systems have made it difficult to quickly analyze individual feedback and provide necessary information to managers, sometimes resulting in delays in improving the workplace environment. Therefore, there is a need for a more effective way to analyze emotional fluctuations and quickly improve the workplace environment.

[0305] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0306] In this invention, the server includes means for receiving message data and recorded data collected from users, means for preprocessing the message data and recorded data and converting it into a unified format, and means for analyzing the data in the unified format and identifying and scoring emotions. This enables effective analysis of fluctuations in the emotions of individual users in the workplace, and when abnormal fluctuations are detected, it quickly notifies administrators of the warning, making it possible to improve the workplace environment.

[0307] "Message data collected from users" refers to the general term for information in the form of text or voice obtained from the communication means regularly used by users.

[0308] "Recorded data" refers to the recording of information in the form of voice, text, or a combination thereof, generated or obtained by a computer system.

[0309] "Preprocessing" refers to a series of data processing operations performed to convert the collected data into an analyzable form.

[0310] "Unified form" refers to the state in which data of different forms and contents has been converted into a form that can be uniformly analyzed.

[0311] "Analyze the data and identify and score emotions" refers to the act of specifying emotions based on the content of the data and quantifying their intensity and characteristics.

[0312] "Emotional fluctuations" refer to the way in which the emotional state changes over time.

[0313] "Abnormal fluctuations" refer to emotional fluctuations that indicate a manner or frequency of change that is not normally seen.

[0314] "Send a warning to the administrator through the communication means" refers to alerting the person with administrative authority via email or a notification system when an abnormality is detected.

[0315] "Visualize the mental health information of users on a dashboard" refers to the screen display for making it easier to visually display the emotional state of individual users and its changes.

[0316] To implement this invention, the server receives message data and recorded data transmitted by the user. This data is obtained from the user's everyday means of communication, such as email or voice chat. After receiving the data, the server uses dedicated preprocessing software to convert the data into a unified format that can be analyzed. Preprocessing includes tokenization and speech-to-text conversion.

[0317] Next, the server uses a generative AI model to identify and score emotions from the pre-processed data. This process quantifies and classifies the emotional information contained in the data, making it possible to identify abnormal emotional fluctuations. If the server detects abnormal emotional fluctuations, it immediately sends a warning to the administrator via communication channels. These communication channels include email and notifications on the dashboard. Furthermore, the analyzed user mental health information is visualized on the dashboard in a format that is easily understandable to the administrator.

[0318] The hardware used for implementation includes cloud servers for preprocessing and analysis. Machine learning frameworks such as TensorFlow and PyTorch are used to run the generated AI model. MySQL or PostgreSQL are used as the database to store the analysis results.

[0319] As a concrete example, suppose an employee starts frequently using the phrase "I feel pressured" due to project stress. This information is analyzed and notified by the server, allowing administrators to intervene quickly and provide support to alleviate the employee's stress.

[0320] Examples of specific prompts for a generative AI model are as follows:

[0321] "Please analyze this email data to detect signs of stress and anxiety."

[0322] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0323] Step 1:

[0324] The terminal collects message and log data from the user's everyday communication methods. In this case, input is in the form of email or voice chat, and output is the collected raw data. The terminal encrypts this data and prepares it for secure transmission to the server.

[0325] Step 2:

[0326] The server receives encrypted data sent from the terminal and decrypts it. The input is encrypted raw data, and the output is the decrypted raw message and recorded data. The server performs preprocessing, such as tokenization and speech-to-text conversion, to convert the data into a unified, analyzable format. This process prepares the data for analysis.

[0327] Step 3:

[0328] The server uses a generative AI model to identify and score emotions based on pre-processed data. The output is the score of emotions contained in the text and audio. The server then quantifies emotion categories such as "joy," "dissatisfaction," and "stress," and determines the intensity of those emotions.

[0329] Step 4:

[0330] The server analyzes emotional fluctuations over time based on the scoring results and detects abnormal patterns. The input is data that has been scored again for emotional fluctuations, and the output is warning flags and information indicating abnormal fluctuations. When an anomaly is detected, the server prepares to take appropriate action.

[0331] Step 5:

[0332] If the server detects an abnormal emotional fluctuation, it sends a warning to the administrator via communication channels. The input is an anomaly detection alert information, and the output is a warning message directed to the administrator. Through this process, the server communicates the situation to the administrator in real time.

[0333] Step 6:

[0334] The server visualizes the analyzed user's mental health information on a dashboard, allowing administrators to easily assess the situation. Input is sentiment analysis and scored data, while output is visually displayed mental health information. The system organizes and presents information in a way that is easy for administrators to understand and quickly grasp the situation.

[0335] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0336] This invention provides a system that analyzes emotions in real time using text and voice data collected from users. This system incorporates an emotion engine for emotion recognition and supports the management of the user's mental health.

[0337] Overview of the Embodiment

[0338] First, the device collects audio data from emails, chats, and recorded meetings. The data is sent to the server via a secure protocol. The server stores the received data and performs preprocessing. Preprocessing includes converting the audio data to text, denoising, and standardizing the data format.

[0339] In the sentiment analysis process, the server uses a sentiment engine, leveraging machine learning models and natural language processing techniques to extract emotional insights from the data. The sentiment engine can recognize new sentiment categories in addition to traditional ones. Furthermore, user profile information is taken into account to improve the accuracy of individual sentiment recognition.

[0340] The analysis results are stored in a database and used for tracking emotions over time and detecting anomaly patterns. The server also has the capability to predict future emotion trends by referring to past data history. If an anomaly is detected, the server generates an alert and sends a notification to HR personnel and managers. This notification is delivered via email or a dedicated notification system.

[0341] Specific example

[0342] For example, suppose an employee is experiencing high levels of stress due to an approaching project deadline. The terminal collects the employee's communication data and sends it to the server. The server's emotion engine analyzes this data and detects negative emotions such as stress and dissatisfaction at a high frequency. As a result, the emotion trend shows signs of worsening, and the server automatically sends an alert to the supervisor.

[0343] Supervisors receive notifications, check the dashboard, and view detailed analysis results. They can then take immediate action to resolve the problem. This strengthens the workplace environment and employee mental health care.

[0344] By implementing this invention, advanced emotional analysis using an emotion engine becomes possible, making early intervention in the workplace a reality.

[0345] The following describes the processing flow.

[0346] Step 1:

[0347] The device automatically collects the user's email, chat, and meeting audio data through a dedicated application. This data is encrypted and sent to the server using a secure protocol.

[0348] Step 2:

[0349] The server stores the received data in a database and performs preprocessing for analysis. Audio data is converted to text using speech recognition technology, and all data is formatted to a consistent style. Noise reduction and typographical error correction are also performed at this stage.

[0350] Step 3:

[0351] The server passes pre-processed text data to the sentiment engine. The sentiment engine uses machine learning models to identify and score the emotions in the data. In this process, data points are classified into multiple emotion categories.

[0352] Step 4:

[0353] The server tracks sentiment data scored by the sentiment engine over time. Anomaly detection algorithms are used to identify unusual sentiment patterns and sudden changes in sentiment to detect anomalies.

[0354] Step 5:

[0355] The server generates alerts based on detected anomaly patterns. These alerts consider individual user profile information and historical data to set tiered warning levels.

[0356] Step 6:

[0357] Alerts generated by the server are sent to administrators via email or a dedicated notification system. The alert notification includes a description of the anomaly along with recommended actions.

[0358] Step 7:

[0359] Users (administrators and HR personnel) can access the dashboard based on alerts to obtain detailed analysis results. This allows them to follow up with stakeholders before problems surface and determine appropriate mental health care measures.

[0360] (Example 2)

[0361] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0362] Conventional sentiment analysis systems have limited ability to accurately identify user emotions and quickly and automatically detect abnormal emotional patterns. Furthermore, analysis based on simple emotion categories struggles to capture the nuances of individual user emotions. In addition, the methods for notifying users when anomalies are detected are limited, posing a challenge in delivering timely and accurate alerts to relevant parties.

[0363] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0364] In this invention, the server includes means for receiving text data and audio data collected from users by an information processing device; means for performing speech recognition and noise reduction preprocessing on the text data and audio data and unifying the data format; and means for identifying emotions by applying machine learning models and natural language processing techniques to the unified data format. This enables the identification of diverse and precise emotion categories, rapid detection of abnormal emotion patterns, and immediate alert notifications to relevant parties.

[0365] An "information processing system" is a collective term for a set of hardware and software used to collect, store, process, and analyze data.

[0366] "User" refers to an individual or organization that provides data using this system.

[0367] "Text data" refers to document-format data generated or provided by the user, including email and chat content.

[0368] "Audio data" refers to data based on audio provided by the user, and includes recorded meeting audio and voice messages.

[0369] "Speech recognition" refers to the technology that analyzes audio data and converts it into text, making it possible to treat speech as textual information.

[0370] "Noise reduction" is the process of removing unwanted sounds and interfering background noise in audio data processing.

[0371] "Preprocessing" refers to a series of processes performed after data collection to prepare the data for analysis, and includes noise reduction and standardization of data formats.

[0372] "Unifying data formats" refers to the process of converting data in different formats into a common, parseable format.

[0373] A "machine learning model" is an algorithm or mathematical model used to learn from large amounts of data and automate specific tasks.

[0374] "Natural language processing technology" refers to a set of technologies that enable computers to understand, interpret, and generate human language.

[0375] "Identifying emotions" refers to the process of estimating and identifying a user's emotional state from text or audio data.

[0376] An "abnormal pattern" refers to a pattern that shows unnatural emotional changes or tendencies that differ from the user's normal emotional state.

[0377] "Electronic communications" refers to all technologies that send and receive information via digital devices, and includes email and dedicated messaging protocols.

[0378] This invention provides a system that analyzes a user's emotions in real time, detects emotional anomalies based on the results, and generates alerts. The invention can be implemented in the following forms.

[0379] The device collects text and audio data from the user. Common data collection devices such as email applications, chat software, and conference recording devices are used for this data collection. The collected data is securely transmitted to a server.

[0380] The server temporarily stores the received data. Audio data, in particular, is converted to text using speech recognition software such as Google Cloud Speech-to-Text or Microsoft Azure Speech Service. This process applies noise filtering algorithms to improve data quality. The data is then standardized into a format suitable for analysis.

[0381] During the analysis process, the server runs an emotion engine. This engine uses machine learning models (e.g., BERT and GPT) and natural language processing techniques to identify emotions from the integrated data. The emotion engine also utilizes generative AI models to recognize diverse emotion categories. Based on user profiles, it is possible to individually improve the accuracy of emotion recognition.

[0382] The results of the sentiment analysis are stored in a dedicated data storage system, and abnormal patterns are detected based on this data. When an abnormality is detected, an alert is automatically sent electronically to administrators and relevant organizational personnel.

[0383] For example, if a user makes a statement like, "I'm worried because I'm not making progress on my exam preparations," during a project, this is recorded in the database, and the server scores it as "anxiety." If this occurs frequently, an anomaly is detected in the emotional data, and an alert is sent to management.

[0384] This invention enables the extraction of emotional insights, which can be used to improve the work environment and support mental health care. An example of a prompt message would be, "A method to analyze a user's emotions and detect anomalies in relation to the pressure of project deadlines."

[0385] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0386] Step 1:

[0387] The terminal collects text and audio data from the user. Specific examples include messages sent by the user in a chat application and audio recorded by a conference recording device. At this stage, the input is raw communication data generated by the user, and the output is a data file appropriately formatted for further processing of this data.

[0388] Step 2:

[0389] The terminal sends the collected data to the server via a secure protocol. The data transmission uses a protocol that encrypts the data for secure transfer (e.g., TLS). The input is the formatted data collected in step 1, and the output is the decrypted raw data received on the server side.

[0390] Step 3:

[0391] The server converts the received audio data into text data. This process utilizes speech recognition software to convert spoken language into text. It also executes a noise filtering algorithm to remove background noise and improve quality. The input is an audio file, and the output is text data with noise removed and converted to characters.

[0392] Step 4:

[0393] The server uses the converted text data to standardize the data format and prepare it for subsequent processing. JSON or XML formats are used for data format standardization. The input is the text data generated in step 3, and the output is standardized, consistent data.

[0394] Step 5:

[0395] The server runs an emotion engine and analyzes the data using machine learning models and natural language processing techniques. It uses a generative AI model to identify positive, negative, and other diverse emotion categories. The input is the integrated data prepared in step 4, and the output is the analyzed data with emotion categories assigned to it.

[0396] Step 6:

[0397] The server stores the analyzed emotion data in a database and performs time-series emotion tracking and anomaly pattern detection. Anomaly pattern detection uses an algorithm that identifies changes that deviate from normal emotional states. The input is the analyzed data generated in step 5, and the output is time-series emotion data and the results of anomaly detection.

[0398] Step 7:

[0399] The server generates an alert when an anomaly is detected and notifies administrators and responsible personnel of the alert via electronic communication. Email or a dedicated messaging system is used for notification. The input is the emotion anomaly data identified in the processing up to step 6, and the output is the status of the alert notification's transmission completion.

[0400] (Application Example 2)

[0401] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0402] There is a challenge in appropriately monitoring the emotional state of the elderly, detecting abnormal emotional patterns early, and responding promptly. In particular, while effectively managing the mental health of the elderly is important in nursing homes and home care settings, current systems make it difficult to grasp emotional changes in real time and provide necessary care.

[0403] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0404] In this invention, the server includes means for receiving data collected from users, means for preprocessing the data and converting it into a unified format, means for analyzing the data in the unified format and identifying emotions, means for tracking the identified emotions in chronological order and detecting abnormal patterns, means for sending a warning to an administrator when an abnormal emotional pattern is detected, means for monitoring the emotional trends of elderly people in real time, and means for prompting caregivers of elderly people to respond based on emotional trends. This makes it possible to grasp the emotional state of elderly people in real time and respond quickly to abnormalities.

[0405] "Means for receiving data" refers to an interface for receiving information such as text and audio provided by the user into the server.

[0406] "Methods for pre-processing and converting to a unified format" refers to processing methods that perform noise reduction, text conversion, and other modifications on received data to transform it into a consistent data format.

[0407] "Means of identifying emotions" refer to technical methods for analyzing pre-processed data and recognizing specific emotional states.

[0408] "Methods for tracking over time and detecting abnormal patterns" refers to a function that observes changes in emotions over time and identifies unusual emotional movements.

[0409] A "means of sending warnings" refers to a system for communicating messages to administrators or caregivers about detected abnormal patterns to draw their attention.

[0410] "Methods for monitoring the emotional trends of the elderly in real time" refers to a process for collecting and continuously observing the emotional state of the elderly in real time.

[0411] "Methods to encourage caregivers to respond based on emotional trends" refer to systems that instruct relevant parties to carry out care activities that meet the needs of elderly individuals in response to changes in their emotional state.

[0412] The system implementing this invention mainly consists of a server and a terminal. The terminal is responsible for continuously recording conversations and monologues of elderly people and sending the audio data to the server via a secure protocol (e.g., HTTPS). The server uses the Google Cloud Speech-to-Text API to convert the audio data into text and performs preprocessing such as noise reduction and data format standardization.

[0413] For server-side sentiment analysis, software such as Python and scikit-learn are used, employing machine learning techniques to identify emotions from text data. Specifically, emotions are scored, and their changes are tracked over time. The identified emotion data is stored in a database, allowing for observation of emotional trends as time-series data.

[0414] When an abnormal emotional pattern is detected, the server immediately sends a warning message to the administrator or caregiver. Communication is conducted via email or a dedicated notification application, enabling a rapid response. This allows for real-time monitoring of trends in the emotional state of elderly individuals and prompting necessary interventions.

[0415] For example, if an elderly person's frequency of talking to themselves at night increases, caregivers can consider appropriate responses based on data such as the possibility that their daytime activity level has decreased. In this way, the objective of this invention is to effectively manage the mental health of the elderly and improve their quality of life.

[0416] Examples of prompts for a generative AI model are as follows:

[0417] "Analyze daily conversational audio data from elderly individuals to determine their daily emotional trends. In particular, detect signs of stress and anxiety and create a report."

[0418] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0419] Step 1:

[0420] The device records the daily conversations and monologues of elderly people. The audio data is the input and is temporarily stored as a digital audio file before being sent to the server. This data is collected in batches at specific time intervals.

[0421] Step 2:

[0422] The terminal sends the collected audio data to the server using a secure protocol. The input is the audio data from the terminal, and the output is the stream-formatted audio data received by the server. HTTPS is used as the communication protocol to ensure secure data transfer.

[0423] Step 3:

[0424] The server converts the received audio data into text data using the Google Cloud Speech-to-Text API. The input is audio data, and the output is text data. This text conversion prepares the data for natural language processing.

[0425] Step 4:

[0426] The server performs preprocessing on the text data, including noise reduction and data format standardization. The input is text data, and the output is de-noised and standardized text data. Specific operations include the removal of special characters and grammatical normalization.

[0427] Step 5:

[0428] The server performs sentiment analysis using preprocessed text data. It runs machine learning models using Python and scikit-learn. Input is text data in a unified format, and output is sentiment scores. A generative AI model is used to classify the data into specific sentiment categories.

[0429] Step 6:

[0430] The server stores the identified sentiments as time-series data in a database. The input is the sentiment score, and the output is the record in the time-series database. This makes it possible to track and evaluate sentiment trends later.

[0431] Step 7:

[0432] The server analyzes time-series data from the database to detect abnormal emotional patterns. The input is time-series data from the database, and the output is the detection results of abnormal patterns. These abnormal patterns trigger the next process.

[0433] Step 8:

[0434] When an abnormal pattern is detected, the server sends a warning message to administrators or caregivers. The input is the abnormal pattern, and the output is a warning message via email or notification application. This enables a rapid response tailored to the condition of the elderly person.

[0435] An example of a prompt message might be: "Analyze the daily conversational audio data of elderly people and determine their daily emotional trends. In particular, detect signs of stress and anxiety and create a report."

[0436] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0437] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0438] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0439] [Third Embodiment]

[0440] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0441] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0442] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0443] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0444] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0445] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0446] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0447] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0448] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0449] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0450] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0451] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0452] This invention provides a system that analyzes users' emotions based on communication among team members and detects early signs of stress and burnout. This allows managers to take appropriate measures and improve the workplace environment.

[0453] Overview of the Embodiment

[0454] First, the device automatically collects the user's email, chat, and meeting recording data. The collected data is sent to the server via encrypted communication. After receiving this data, the server begins pre-processing the text and audio data. The pre-processed data is then used for analysis to identify sentiment.

[0455] The server performs sentiment analysis using machine learning models. This scores and classifies the user's emotions from text and audio data. For example, categories such as "stress," "dissatisfaction," and "joy" are assigned. The analysis results are stored in a database and managed as time-series data for each user.

[0456] Furthermore, the server tracks emotional changes in real time. If an abnormal pattern is detected, it sends an alert to the administrator based on a pre-configured threshold. This alert is delivered via dashboard, email, and notification system, allowing administrators to quickly formulate a response plan.

[0457] Specific example

[0458] As a concrete example, suppose a member of a project team starts using phrases like "I'm tired" or "I've lost motivation" frequently in chat messages after working for long hours. This information is collected on the terminal and processed and analyzed on the server. If the analysis determines that stress levels are high, the server sends an alert to HR personnel or the project leader. This allows for early follow-up conversations and the implementation of appropriate support measures.

[0459] Thus, the present invention provides an embodiment of a system that enables efficient detection of changes in a user's emotions and allows appropriate countermeasures to be taken before problems become apparent.

[0460] The following describes the processing flow.

[0461] Step 1:

[0462] The device automatically collects user email, chat, and meeting recording data through its interface. This can be done using a dedicated application or browser extension. The collected data is encrypted and sent to the server using a secure protocol.

[0463] Step 2:

[0464] The server receives the incoming data and temporarily stores it in the database. Then, it performs data preprocessing. Specifically, in the case of audio data, it is converted to text using speech recognition software, and all data formats are standardized. Noise reduction and spell checking are also performed at this stage.

[0465] Step 3:

[0466] The server performs sentiment analysis on pre-processed text data. Machine learning models and natural language processing algorithms are used to identify and score sentiments from words and phrases. Sentiment categories include positive, neutral, and negative, and the results are stored in a database.

[0467] Step 4:

[0468] The server uses the results of sentiment analysis to monitor user emotional fluctuations over time. Statistical methods and anomaly detection are employed to detect abnormal changes and specific patterns. This detection is performed in real time, providing foundational data for rapid response.

[0469] Step 5:

[0470] The server generates an alert based on a pre-configured threshold when an abnormal emotional pattern is detected. This alert is sent to HR personnel and managers via email or push notification. Receiving the alert allows administrators to quickly understand the problem and take appropriate action.

[0471] Step 6:

[0472] Users (administrators and HR personnel) can view detailed sentiment analysis on a dashboard based on received alerts. They can then contact problematic team members early on and provide stress care support and resources. This approach allows for resolution before problems become entrenched.

[0473] (Example 1)

[0474] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0475] It is difficult to detect early signs of emotional changes or stress in employees in the workplace and to take appropriate action quickly. To improve productivity while maintaining employees' mental health, it is necessary to monitor their emotional state efficiently and accurately.

[0476] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0477] In this invention, the server includes means for receiving information acquired from a user, means for preprocessing the information and converting it into a standardized format, and means for interpreting the information in the standardized format and detecting emotions. This makes it possible to quickly grasp changes in employees' emotions in the workplace and take appropriate action, thereby improving productivity while maintaining mental health.

[0478] A "user" refers to anyone or an organization that provides information to the system.

[0479] "Information" refers to data collected from users, including text data and audio data.

[0480] "Preprocessing" refers to a series of processes that convert collected information into a format that is easy to analyze.

[0481] "Standardized format" refers to a state in which pre-processed information has been converted into a common format that can be analyzed.

[0482] "Interpretation" refers to the process of detecting emotions based on standardized information.

[0483] "Emotions" refer to internal experiences and situations that represent the user's psychological state.

[0484] An "abnormal situation" refers to an unusual pattern or tendency of emotions, and is an event that requires attention from the manager.

[0485] An "administrator" is a person within an organization who is responsible for receiving notifications sent by the system.

[0486] "Notification" refers to a means of communication sent to an administrator when an abnormal situation occurs.

[0487] "Electronic communication" refers to means of transmitting information via digital media, including email and instant messaging.

[0488] "Notification services" refer to additional means of providing information about abnormal situations, complementing electronic communications.

[0489] This invention provides a system that utilizes communication data between users to analyze emotions in real time. Specific embodiments are described below.

[0490] The terminal will collect user communication data, specifically text and voice data, locally. It is expected that dedicated applications and monitoring software for data collection will be installed on this terminal. Users will not need to perform any special operations while continuing their normal work.

[0491] The collected data is securely transmitted to the server using common encryption technologies such as AES and HTTPS. Upon arrival at the server, preprocessing is performed immediately. In this preprocessing, natural language processing tools (e.g., NLTK and spaCy) clean the text data, and speech data is converted to text using a speech recognition API (e.g., Google Speech-to-Text).

[0492] The server uses pre-processed data to request sentiment analysis from a generative AI model. Models such as BERT and GPT are expected to be applied. An example of a prompt is "Classify the sentiment of this text." This results in the model scoring and classifying emotions such as "stress," "dissatisfaction," and "joy."

[0493] The analysis results are stored as time-series data in a database on the server, and administrators can monitor this in real time through dashboards and other means. If an abnormal emotional pattern is detected, an immediate alert is sent to the administrator via email or a dedicated notification service.

[0494] For example, if a member of a project team frequently uses phrases like "tired" in chat messages after working for long hours, this could be interpreted as a sign of stress, triggering an emotional analysis that sends an alert to the HR department. Based on this information, managers can then take necessary follow-up steps and implement appropriate support measures.

[0495] This invention enables companies and organizations to efficiently understand the emotional state of their employees and contribute to maintaining a healthy work environment.

[0496] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0497] Step 1:

[0498] The device collects text and audio data from the user's daily communication activities. Inputs include emails, chat history, and meeting recordings, and the output is encrypted data. This collection process is automated by a dedicated application installed on the device.

[0499] Step 2:

[0500] The terminal encrypts the collected data using AES and securely transmits it to the server using the HTTPS protocol. The input is the unencrypted data obtained in step 1, and the output is the securely encrypted and transmitted data. This operation ensures the confidentiality of the data.

[0501] Step 3:

[0502] The server performs preprocessing on the received data. The input is encrypted data, and the raw data obtained by decrypting it is the result. The server uses natural language processing tools (NLTK and spaCy) to clean the text of this data and converts the audio data to text using the Google Speech-to-Text API. The output is standardized data that can be parsed.

[0503] Step 4:

[0504] The server inputs pre-processed data into a generating AI model and performs sentiment analysis. The input is standardized data, and the output is a sentiment score as the analysis result. During this process, the model is instructed using prompts such as "Classify the sentiment of this text."

[0505] Step 5:

[0506] The server stores the results of the sentiment analysis in a database. The input is the raw data in sentiment score format, and the output is time-series data for each user in the database. This establishes a system that allows for continuous tracking of changes in the user's sentiment.

[0507] Step 6:

[0508] The server analyzes accumulated time-series data and immediately sends an alert to the administrator if an abnormal sentiment pattern occurs. The input is time-series data, and the output is notification information based on anomaly detection. Notifications are provided to the administrator via email and a dashboard.

[0509] (Application Example 1)

[0510] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0511] Mental health issues are becoming increasingly serious in modern workplaces, and especially in environments with long working hours and high stress levels, it is crucial to detect emotional fluctuations early and take appropriate measures. However, existing systems have made it difficult to quickly analyze individual feedback and provide necessary information to managers, sometimes resulting in delays in improving the workplace environment. Therefore, there is a need for a more effective way to analyze emotional fluctuations and quickly improve the workplace environment.

[0512] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0513] In this invention, the server includes means for receiving message data and recorded data collected from users, means for preprocessing the message data and recorded data and converting it into a unified format, and means for analyzing the data in the unified format and identifying and scoring emotions. This enables effective analysis of fluctuations in the emotions of individual users in the workplace, and when abnormal fluctuations are detected, it quickly notifies administrators of the warning, making it possible to improve the workplace environment.

[0514] "Message data collected from users" refers to the general term for text or audio information obtained from communication methods that users use on a daily basis.

[0515] "Recorded data" refers to recordings of information, such as audio, text, or a combination thereof, generated or acquired by a computer system.

[0516] "Preprocessing" refers to a series of data manipulation operations performed to convert collected data into an analyzable format.

[0517] A "unified format" refers to a state in which data of different formats and content has been converted into a form that can be analyzed uniformly.

[0518] "Analyzing data to identify and score emotions" is the act of identifying emotions based on the content of the data and quantifying their intensity and characteristics.

[0519] "Emotional fluctuations" refer to how emotional states change over time.

[0520] "Abnormal fluctuations" refer to emotional changes that exhibit unusual patterns or frequencies.

[0521] "Sending a warning to the administrator via communication means" refers to alerting administrators via email or a notification system when an anomaly is detected.

[0522] "Visualizing users' mental health information on a dashboard" refers to a screen display that makes it easier to visually show individual users' emotional states and how they change.

[0523] To implement this invention, the server receives message data and recorded data transmitted by the user. This data is obtained from the user's everyday means of communication, such as email or voice chat. After receiving the data, the server uses dedicated preprocessing software to convert the data into a unified format that can be analyzed. Preprocessing includes tokenization and speech-to-text conversion.

[0524] Next, the server uses a generative AI model to identify and score emotions from the pre-processed data. This process quantifies and classifies the emotional information contained in the data, making it possible to identify abnormal emotional fluctuations. If the server detects abnormal emotional fluctuations, it immediately sends a warning to the administrator via communication channels. These communication channels include email and notifications on the dashboard. Furthermore, the analyzed user mental health information is visualized on the dashboard in a format that is easily understandable to the administrator.

[0525] The hardware used for implementation includes cloud servers for preprocessing and analysis. Machine learning frameworks such as TensorFlow and PyTorch are used to run the generated AI model. MySQL or PostgreSQL are used as the database to store the analysis results.

[0526] As a concrete example, suppose an employee starts frequently using the phrase "I feel pressured" due to project stress. This information is analyzed and notified by the server, allowing administrators to intervene quickly and provide support to alleviate the employee's stress.

[0527] Examples of specific prompts for a generative AI model are as follows:

[0528] "Please analyze this email data to detect signs of stress and anxiety."

[0529] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0530] Step 1:

[0531] The terminal collects message and log data from the user's everyday communication methods. In this case, input is in the form of email or voice chat, and output is the collected raw data. The terminal encrypts this data and prepares it for secure transmission to the server.

[0532] Step 2:

[0533] The server receives encrypted data sent from the terminal and decrypts it. The input is encrypted raw data, and the output is the decrypted raw message and recorded data. The server performs preprocessing, such as tokenization and speech-to-text conversion, to convert the data into a unified, analyzable format. This process prepares the data for analysis.

[0534] Step 3:

[0535] The server uses a generative AI model to identify and score emotions based on pre-processed data. The output is the score of emotions contained in the text and audio. The server then quantifies emotion categories such as "joy," "dissatisfaction," and "stress," and determines the intensity of those emotions.

[0536] Step 4:

[0537] The server analyzes emotional fluctuations over time based on the scoring results and detects abnormal patterns. The input is data that has been scored again for emotional fluctuations, and the output is warning flags and information indicating abnormal fluctuations. When an anomaly is detected, the server prepares to take appropriate action.

[0538] Step 5:

[0539] If the server detects an abnormal emotional fluctuation, it sends a warning to the administrator via communication channels. The input is an anomaly detection alert information, and the output is a warning message directed to the administrator. Through this process, the server communicates the situation to the administrator in real time.

[0540] Step 6:

[0541] The server visualizes the analyzed user's mental health information on a dashboard, allowing administrators to easily assess the situation. Input is sentiment analysis and scored data, while output is visually displayed mental health information. The system organizes and presents information in a way that is easy for administrators to understand and quickly grasp the situation.

[0542] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0543] This invention provides a system that analyzes emotions in real time using text and voice data collected from users. This system incorporates an emotion engine for emotion recognition and supports the management of the user's mental health.

[0544] Overview of the Embodiment

[0545] First, the device collects audio data from emails, chats, and recorded meetings. The data is sent to the server via a secure protocol. The server stores the received data and performs preprocessing. Preprocessing includes converting the audio data to text, denoising, and standardizing the data format.

[0546] In the sentiment analysis process, the server uses a sentiment engine, leveraging machine learning models and natural language processing techniques to extract emotional insights from the data. The sentiment engine can recognize new sentiment categories in addition to traditional ones. Furthermore, user profile information is taken into account to improve the accuracy of individual sentiment recognition.

[0547] The analysis results are stored in a database and used for tracking emotions over time and detecting anomaly patterns. The server also has the capability to predict future emotion trends by referring to past data history. If an anomaly is detected, the server generates an alert and sends a notification to HR personnel and managers. This notification is delivered via email or a dedicated notification system.

[0548] Specific example

[0549] For example, suppose an employee is experiencing high levels of stress due to an approaching project deadline. The terminal collects the employee's communication data and sends it to the server. The server's emotion engine analyzes this data and detects negative emotions such as stress and dissatisfaction at a high frequency. As a result, the emotion trend shows signs of worsening, and the server automatically sends an alert to the supervisor.

[0550] Supervisors receive notifications, check the dashboard, and view detailed analysis results. They can then take immediate action to resolve the problem. This strengthens the workplace environment and employee mental health care.

[0551] By implementing this invention, advanced emotional analysis using an emotion engine becomes possible, making early intervention in the workplace a reality.

[0552] The following describes the processing flow.

[0553] Step 1:

[0554] The device automatically collects the user's email, chat, and meeting audio data through a dedicated application. This data is encrypted and sent to the server using a secure protocol.

[0555] Step 2:

[0556] The server stores the received data in a database and performs preprocessing for analysis. Audio data is converted to text using speech recognition technology, and all data is formatted to a consistent style. Noise reduction and typographical error correction are also performed at this stage.

[0557] Step 3:

[0558] The server passes pre-processed text data to the sentiment engine. The sentiment engine uses machine learning models to identify and score the emotions in the data. In this process, data points are classified into multiple emotion categories.

[0559] Step 4:

[0560] The server tracks sentiment data scored by the sentiment engine over time. Anomaly detection algorithms are used to identify unusual sentiment patterns and sudden changes in sentiment to detect anomalies.

[0561] Step 5:

[0562] The server generates alerts based on detected anomaly patterns. These alerts consider individual user profile information and historical data to set tiered warning levels.

[0563] Step 6:

[0564] Alerts generated by the server are sent to administrators via email or a dedicated notification system. The alert notification includes a description of the anomaly along with recommended actions.

[0565] Step 7:

[0566] Users (administrators and HR personnel) can access the dashboard based on alerts to obtain detailed analysis results. This allows them to follow up with stakeholders before problems surface and determine appropriate mental health care measures.

[0567] (Example 2)

[0568] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0569] Conventional sentiment analysis systems have limited ability to accurately identify user emotions and quickly and automatically detect abnormal emotional patterns. Furthermore, analysis based on simple emotion categories struggles to capture the nuances of individual user emotions. In addition, the methods for notifying users when anomalies are detected are limited, posing a challenge in delivering timely and accurate alerts to relevant parties.

[0570] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0571] In this invention, the server includes means for receiving text data and audio data collected from users by an information processing device; means for performing speech recognition and noise reduction preprocessing on the text data and audio data and unifying the data format; and means for identifying emotions by applying machine learning models and natural language processing techniques to the unified data format. This enables the identification of diverse and precise emotion categories, rapid detection of abnormal emotion patterns, and immediate alert notifications to relevant parties.

[0572] An "information processing system" is a collective term for a set of hardware and software used to collect, store, process, and analyze data.

[0573] "User" refers to an individual or organization that provides data using this system.

[0574] "Text data" refers to document-format data generated or provided by the user, including email and chat content.

[0575] "Audio data" refers to data based on audio provided by the user, and includes recorded meeting audio and voice messages.

[0576] "Speech recognition" refers to the technology that analyzes audio data and converts it into text, making it possible to treat speech as textual information.

[0577] "Noise reduction" is the process of removing unwanted sounds and interfering background noise in audio data processing.

[0578] "Preprocessing" refers to a series of processes performed after data collection to prepare the data for analysis, and includes noise reduction and standardization of data formats.

[0579] "Unifying data formats" refers to the process of converting data in different formats into a common, parseable format.

[0580] A "machine learning model" is an algorithm or mathematical model used to learn from large amounts of data and automate specific tasks.

[0581] "Natural language processing technology" refers to a set of technologies that enable computers to understand, interpret, and generate human language.

[0582] "Identifying emotions" refers to the process of estimating and identifying a user's emotional state from text or audio data.

[0583] An "abnormal pattern" refers to a pattern that shows unnatural emotional changes or tendencies that differ from the user's normal emotional state.

[0584] "Electronic communications" refers to all technologies that send and receive information via digital devices, and includes email and dedicated messaging protocols.

[0585] This invention provides a system that analyzes a user's emotions in real time, detects emotional anomalies based on the results, and generates alerts. The invention can be implemented in the following forms.

[0586] The device collects text and audio data from the user. Common data collection devices such as email applications, chat software, and conference recording devices are used for this data collection. The collected data is securely transmitted to a server.

[0587] The server temporarily stores the received data. Audio data, in particular, is converted to text using speech recognition software such as Google Cloud Speech-to-Text or Microsoft Azure Speech Service. This process applies noise filtering algorithms to improve data quality. The data is then standardized into a format suitable for analysis.

[0588] During the analysis process, the server runs an emotion engine. This engine uses machine learning models (e.g., BERT and GPT) and natural language processing techniques to identify emotions from the integrated data. The emotion engine also utilizes generative AI models to recognize diverse emotion categories. Based on user profiles, it is possible to individually improve the accuracy of emotion recognition.

[0589] The results of the sentiment analysis are stored in a dedicated data storage system, and abnormal patterns are detected based on this data. When an abnormality is detected, an alert is automatically sent electronically to administrators and relevant organizational personnel.

[0590] For example, if a user makes a statement like, "I'm worried because I'm not making progress on my exam preparations," during a project, this is recorded in the database, and the server scores it as "anxiety." If this occurs frequently, an anomaly is detected in the emotional data, and an alert is sent to management.

[0591] This invention enables the extraction of emotional insights, which can be used to improve the work environment and support mental health care. An example of a prompt message would be, "A method to analyze a user's emotions and detect anomalies in relation to the pressure of project deadlines."

[0592] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0593] Step 1:

[0594] The terminal collects text and audio data from the user. Specific examples include messages sent by the user in a chat application and audio recorded by a conference recording device. At this stage, the input is raw communication data generated by the user, and the output is a data file appropriately formatted for further processing of this data.

[0595] Step 2:

[0596] The terminal sends the collected data to the server via a secure protocol. The data transmission uses a protocol that encrypts the data for secure transfer (e.g., TLS). The input is the formatted data collected in step 1, and the output is the decrypted raw data received on the server side.

[0597] Step 3:

[0598] The server converts the received audio data into text data. This process utilizes speech recognition software to convert spoken language into text. It also executes a noise filtering algorithm to remove background noise and improve quality. The input is an audio file, and the output is text data with noise removed and converted to characters.

[0599] Step 4:

[0600] The server uses the converted text data to standardize the data format and prepare it for subsequent processing. JSON or XML formats are used for data format standardization. The input is the text data generated in step 3, and the output is standardized, consistent data.

[0601] Step 5:

[0602] The server runs an emotion engine and analyzes the data using machine learning models and natural language processing techniques. It uses a generative AI model to identify positive, negative, and other diverse emotion categories. The input is the integrated data prepared in step 4, and the output is the analyzed data with emotion categories assigned to it.

[0603] Step 6:

[0604] The server stores the analyzed emotion data in a database and performs time-series emotion tracking and anomaly pattern detection. Anomaly pattern detection uses an algorithm that identifies changes that deviate from normal emotional states. The input is the analyzed data generated in step 5, and the output is time-series emotion data and the results of anomaly detection.

[0605] Step 7:

[0606] The server generates an alert when an anomaly is detected and notifies administrators and responsible personnel of the alert via electronic communication. Email or a dedicated messaging system is used for notification. The input is the emotion anomaly data identified in the processing up to step 6, and the output is the status of the alert notification's transmission completion.

[0607] (Application Example 2)

[0608] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0609] There is a challenge in appropriately monitoring the emotional state of the elderly, detecting abnormal emotional patterns early, and responding promptly. In particular, while effectively managing the mental health of the elderly is important in nursing homes and home care settings, current systems make it difficult to grasp emotional changes in real time and provide necessary care.

[0610] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0611] In this invention, the server includes means for receiving data collected from users, means for preprocessing the data and converting it into a unified format, means for analyzing the data in the unified format and identifying emotions, means for tracking the identified emotions in chronological order and detecting abnormal patterns, means for sending a warning to an administrator when an abnormal emotional pattern is detected, means for monitoring the emotional trends of elderly people in real time, and means for prompting caregivers of elderly people to respond based on emotional trends. This makes it possible to grasp the emotional state of elderly people in real time and respond quickly to abnormalities.

[0612] "Means for receiving data" refers to an interface for receiving information such as text and audio provided by the user into the server.

[0613] "Methods for pre-processing and converting to a unified format" refers to processing methods that perform noise reduction, text conversion, and other modifications on received data to transform it into a consistent data format.

[0614] "Means of identifying emotions" refer to technical methods for analyzing pre-processed data and recognizing specific emotional states.

[0615] "Methods for tracking over time and detecting abnormal patterns" refers to a function that observes changes in emotions over time and identifies unusual emotional movements.

[0616] A "means of sending warnings" refers to a system for communicating messages to administrators or caregivers about detected abnormal patterns to draw their attention.

[0617] "Methods for monitoring the emotional trends of the elderly in real time" refers to a process for collecting and continuously observing the emotional state of the elderly in real time.

[0618] "Methods to encourage caregivers to respond based on emotional trends" refer to systems that instruct relevant parties to carry out care activities that meet the needs of elderly individuals in response to changes in their emotional state.

[0619] The system implementing this invention mainly consists of a server and a terminal. The terminal is responsible for continuously recording conversations and monologues of elderly people and sending the audio data to the server via a secure protocol (e.g., HTTPS). The server uses the Google Cloud Speech-to-Text API to convert the audio data into text and performs preprocessing such as noise reduction and data format standardization.

[0620] For server-side sentiment analysis, software such as Python and scikit-learn are used, employing machine learning techniques to identify emotions from text data. Specifically, emotions are scored, and their changes are tracked over time. The identified emotion data is stored in a database, allowing for observation of emotional trends as time-series data.

[0621] When an abnormal emotional pattern is detected, the server immediately sends a warning message to the administrator or caregiver. Communication is conducted via email or a dedicated notification application, enabling a rapid response. This allows for real-time monitoring of trends in the emotional state of elderly individuals and prompting necessary interventions.

[0622] For example, if an elderly person's frequency of talking to themselves at night increases, caregivers can consider appropriate responses based on data such as the possibility that their daytime activity level has decreased. In this way, the objective of this invention is to effectively manage the mental health of the elderly and improve their quality of life.

[0623] Examples of prompts for a generative AI model are as follows:

[0624] "Analyze daily conversational audio data from elderly individuals to determine their daily emotional trends. In particular, detect signs of stress and anxiety and create a report."

[0625] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0626] Step 1:

[0627] The device records the daily conversations and monologues of elderly people. The audio data is the input and is temporarily stored as a digital audio file before being sent to the server. This data is collected in batches at specific time intervals.

[0628] Step 2:

[0629] The terminal sends the collected audio data to the server using a secure protocol. The input is the audio data from the terminal, and the output is the stream-formatted audio data received by the server. HTTPS is used as the communication protocol to ensure secure data transfer.

[0630] Step 3:

[0631] The server converts the received audio data into text data using the Google Cloud Speech-to-Text API. The input is audio data, and the output is text data. This text conversion prepares the data for natural language processing.

[0632] Step 4:

[0633] The server performs preprocessing on the text data, including noise reduction and data format standardization. The input is text data, and the output is de-noised and standardized text data. Specific operations include the removal of special characters and grammatical normalization.

[0634] Step 5:

[0635] The server performs sentiment analysis using preprocessed text data. It runs machine learning models using Python and scikit-learn. Input is text data in a unified format, and output is sentiment scores. A generative AI model is used to classify the data into specific sentiment categories.

[0636] Step 6:

[0637] The server stores the identified sentiments as time-series data in a database. The input is the sentiment score, and the output is the record in the time-series database. This makes it possible to track and evaluate sentiment trends later.

[0638] Step 7:

[0639] The server analyzes time-series data from the database to detect abnormal emotional patterns. The input is time-series data from the database, and the output is the detection results of abnormal patterns. These abnormal patterns trigger the next process.

[0640] Step 8:

[0641] When an abnormal pattern is detected, the server sends a warning message to administrators or caregivers. The input is the abnormal pattern, and the output is a warning message via email or notification application. This enables a rapid response tailored to the condition of the elderly person.

[0642] An example of a prompt message might be: "Analyze the daily conversational audio data of elderly people and determine their daily emotional trends. In particular, detect signs of stress and anxiety and create a report."

[0643] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0644] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0645] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0646] [Fourth Embodiment]

[0647] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0648] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0649] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0650] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0651] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0652] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0653] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0654] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0655] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0656] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0657] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0658] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0659] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0660] This invention provides a system that analyzes users' emotions based on communication among team members and detects early signs of stress and burnout. This allows managers to take appropriate measures and improve the workplace environment.

[0661] Overview of the Embodiment

[0662] First, the device automatically collects the user's email, chat, and meeting recording data. The collected data is sent to the server via encrypted communication. After receiving this data, the server begins pre-processing the text and audio data. The pre-processed data is then used for analysis to identify sentiment.

[0663] The server performs sentiment analysis using machine learning models. This scores and classifies the user's emotions from text and audio data. For example, categories such as "stress," "dissatisfaction," and "joy" are assigned. The analysis results are stored in a database and managed as time-series data for each user.

[0664] Furthermore, the server tracks emotional changes in real time. If an abnormal pattern is detected, it sends an alert to the administrator based on a pre-configured threshold. This alert is delivered via dashboard, email, and notification system, allowing administrators to quickly formulate a response plan.

[0665] Specific example

[0666] As a concrete example, suppose a member of a project team starts using phrases like "I'm tired" or "I've lost motivation" frequently in chat messages after working for long hours. This information is collected on the terminal and processed and analyzed on the server. If the analysis determines that stress levels are high, the server sends an alert to HR personnel or the project leader. This allows for early follow-up conversations and the implementation of appropriate support measures.

[0667] Thus, the present invention provides an embodiment of a system that enables efficient detection of changes in a user's emotions and allows appropriate countermeasures to be taken before problems become apparent.

[0668] The following describes the processing flow.

[0669] Step 1:

[0670] The device automatically collects user email, chat, and meeting recording data through its interface. This can be done using a dedicated application or browser extension. The collected data is encrypted and sent to the server using a secure protocol.

[0671] Step 2:

[0672] The server receives the incoming data and temporarily stores it in the database. Then, it performs data preprocessing. Specifically, in the case of audio data, it is converted to text using speech recognition software, and all data formats are standardized. Noise reduction and spell checking are also performed at this stage.

[0673] Step 3:

[0674] The server performs sentiment analysis on pre-processed text data. Machine learning models and natural language processing algorithms are used to identify and score sentiments from words and phrases. Sentiment categories include positive, neutral, and negative, and the results are stored in a database.

[0675] Step 4:

[0676] The server uses the results of sentiment analysis to monitor user emotional fluctuations over time. Statistical methods and anomaly detection are employed to detect abnormal changes and specific patterns. This detection is performed in real time, providing foundational data for rapid response.

[0677] Step 5:

[0678] The server generates an alert based on a pre-configured threshold when an abnormal emotional pattern is detected. This alert is sent to HR personnel and managers via email or push notification. Receiving the alert allows administrators to quickly understand the problem and take appropriate action.

[0679] Step 6:

[0680] Users (administrators and HR personnel) can view detailed sentiment analysis on a dashboard based on received alerts. They can then contact problematic team members early on and provide stress care support and resources. This approach allows for resolution before problems become entrenched.

[0681] (Example 1)

[0682] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0683] It is difficult to detect early signs of emotional changes or stress in employees in the workplace and to take appropriate action quickly. To improve productivity while maintaining employees' mental health, it is necessary to monitor their emotional state efficiently and accurately.

[0684] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0685] In this invention, the server includes means for receiving information acquired from a user, means for preprocessing the information and converting it into a standardized format, and means for interpreting the information in the standardized format and detecting emotions. This makes it possible to quickly grasp changes in employees' emotions in the workplace and take appropriate action, thereby improving productivity while maintaining mental health.

[0686] A "user" refers to anyone or an organization that provides information to the system.

[0687] "Information" refers to data collected from users, including text data and audio data.

[0688] "Preprocessing" refers to a series of processes that convert collected information into a format that is easy to analyze.

[0689] "Standardized format" refers to a state in which pre-processed information has been converted into a common format that can be analyzed.

[0690] "Interpretation" refers to the process of detecting emotions based on standardized information.

[0691] "Emotions" refer to internal experiences and situations that represent the user's psychological state.

[0692] An "abnormal situation" refers to an unusual pattern or tendency of emotions, and is an event that requires attention from the manager.

[0693] An "administrator" is a person within an organization who is responsible for receiving notifications sent by the system.

[0694] "Notification" refers to a means of communication sent to an administrator when an abnormal situation occurs.

[0695] "Electronic communication" refers to means of transmitting information via digital media, including email and instant messaging.

[0696] "Notification services" refer to additional means of providing information about abnormal situations, complementing electronic communications.

[0697] This invention provides a system that utilizes communication data between users to analyze emotions in real time. Specific embodiments are described below.

[0698] The terminal will collect user communication data, specifically text and voice data, locally. It is expected that dedicated applications and monitoring software for data collection will be installed on this terminal. Users will not need to perform any special operations while continuing their normal work.

[0699] The collected data is securely transmitted to the server using common encryption technologies such as AES and HTTPS. Upon arrival at the server, preprocessing is performed immediately. In this preprocessing, natural language processing tools (e.g., NLTK and spaCy) clean the text data, and speech data is converted to text using a speech recognition API (e.g., Google Speech-to-Text).

[0700] The server uses pre-processed data to request sentiment analysis from a generative AI model. Models such as BERT and GPT are expected to be applied. An example of a prompt is "Classify the sentiment of this text." This results in the model scoring and classifying emotions such as "stress," "dissatisfaction," and "joy."

[0701] The analysis results are stored as time-series data in a database on the server, and administrators can monitor this in real time through dashboards and other means. If an abnormal emotional pattern is detected, an immediate alert is sent to the administrator via email or a dedicated notification service.

[0702] For example, if a member of a project team frequently uses phrases like "tired" in chat messages after working for long hours, this could be interpreted as a sign of stress, triggering an emotional analysis that sends an alert to the HR department. Based on this information, managers can then take necessary follow-up steps and implement appropriate support measures.

[0703] This invention enables companies and organizations to efficiently understand the emotional state of their employees and contribute to maintaining a healthy work environment.

[0704] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0705] Step 1:

[0706] The device collects text and audio data from the user's daily communication activities. Inputs include emails, chat history, and meeting recordings, and the output is encrypted data. This collection process is automated by a dedicated application installed on the device.

[0707] Step 2:

[0708] The terminal encrypts the collected data using AES and securely transmits it to the server using the HTTPS protocol. The input is the unencrypted data obtained in step 1, and the output is the securely encrypted and transmitted data. This operation ensures the confidentiality of the data.

[0709] Step 3:

[0710] The server performs preprocessing on the received data. The input is encrypted data, and the raw data obtained by decrypting it is the result. The server uses natural language processing tools (NLTK and spaCy) to clean the text of this data and converts the audio data to text using the Google Speech-to-Text API. The output is standardized data that can be parsed.

[0711] Step 4:

[0712] The server inputs pre-processed data into a generating AI model and performs sentiment analysis. The input is standardized data, and the output is a sentiment score as the analysis result. During this process, the model is instructed using prompts such as "Classify the sentiment of this text."

[0713] Step 5:

[0714] The server stores the results of the sentiment analysis in a database. The input is the raw data in sentiment score format, and the output is time-series data for each user in the database. This establishes a system that allows for continuous tracking of changes in the user's sentiment.

[0715] Step 6:

[0716] The server analyzes accumulated time-series data and immediately sends an alert to the administrator if an abnormal sentiment pattern occurs. The input is time-series data, and the output is notification information based on anomaly detection. Notifications are provided to the administrator via email and a dashboard.

[0717] (Application Example 1)

[0718] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0719] Mental health issues are becoming increasingly serious in modern workplaces, and especially in environments with long working hours and high stress levels, it is crucial to detect emotional fluctuations early and take appropriate measures. However, existing systems have made it difficult to quickly analyze individual feedback and provide necessary information to managers, sometimes resulting in delays in improving the workplace environment. Therefore, there is a need for a more effective way to analyze emotional fluctuations and quickly improve the workplace environment.

[0720] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0721] In this invention, the server includes means for receiving message data and recorded data collected from users, means for preprocessing the message data and recorded data and converting it into a unified format, and means for analyzing the data in the unified format and identifying and scoring emotions. This enables effective analysis of fluctuations in the emotions of individual users in the workplace, and when abnormal fluctuations are detected, it quickly notifies administrators of the warning, making it possible to improve the workplace environment.

[0722] "Message data collected from users" refers to the general term for text or audio information obtained from communication methods that users use on a daily basis.

[0723] "Recorded data" refers to recordings of information, such as audio, text, or a combination thereof, generated or acquired by a computer system.

[0724] "Preprocessing" refers to a series of data manipulation operations performed to convert collected data into an analyzable format.

[0725] A "unified format" refers to a state in which data of different formats and content has been converted into a form that can be analyzed uniformly.

[0726] "Analyzing data to identify and score emotions" is the act of identifying emotions based on the content of the data and quantifying their intensity and characteristics.

[0727] "Emotional fluctuations" refer to how emotional states change over time.

[0728] "Abnormal fluctuations" refer to emotional changes that exhibit unusual patterns or frequencies.

[0729] "Sending a warning to the administrator via communication means" refers to alerting administrators via email or a notification system when an anomaly is detected.

[0730] "Visualizing users' mental health information on a dashboard" refers to a screen display that makes it easier to visually show individual users' emotional states and how they change.

[0731] To implement this invention, the server receives message data and recorded data transmitted by the user. This data is obtained from the user's everyday means of communication, such as email or voice chat. After receiving the data, the server uses dedicated preprocessing software to convert the data into a unified format that can be analyzed. Preprocessing includes tokenization and speech-to-text conversion.

[0732] Next, the server uses a generative AI model to identify and score emotions from the pre-processed data. This process quantifies and classifies the emotional information contained in the data, making it possible to identify abnormal emotional fluctuations. If the server detects abnormal emotional fluctuations, it immediately sends a warning to the administrator via communication channels. These communication channels include email and notifications on the dashboard. Furthermore, the analyzed user mental health information is visualized on the dashboard in a format that is easily understandable to the administrator.

[0733] The hardware used for implementation includes cloud servers for preprocessing and analysis. Machine learning frameworks such as TensorFlow and PyTorch are used to run the generated AI model. MySQL or PostgreSQL are used as the database to store the analysis results.

[0734] As a concrete example, suppose an employee starts frequently using the phrase "I feel pressured" due to project stress. This information is analyzed and notified by the server, allowing administrators to intervene quickly and provide support to alleviate the employee's stress.

[0735] Examples of specific prompts for a generative AI model are as follows:

[0736] "Please analyze this email data to detect signs of stress and anxiety."

[0737] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0738] Step 1:

[0739] The terminal collects message and log data from the user's everyday communication methods. In this case, input is in the form of email or voice chat, and output is the collected raw data. The terminal encrypts this data and prepares it for secure transmission to the server.

[0740] Step 2:

[0741] The server receives encrypted data sent from the terminal and decrypts it. The input is encrypted raw data, and the output is the decrypted raw message and recorded data. The server performs preprocessing, such as tokenization and speech-to-text conversion, to convert the data into a unified, analyzable format. This process prepares the data for analysis.

[0742] Step 3:

[0743] The server uses a generative AI model to identify and score emotions based on pre-processed data. The output is the score of emotions contained in the text and audio. The server then quantifies emotion categories such as "joy," "dissatisfaction," and "stress," and determines the intensity of those emotions.

[0744] Step 4:

[0745] The server analyzes emotional fluctuations over time based on the scoring results and detects abnormal patterns. The input is data that has been scored again for emotional fluctuations, and the output is warning flags and information indicating abnormal fluctuations. When an anomaly is detected, the server prepares to take appropriate action.

[0746] Step 5:

[0747] If the server detects an abnormal emotional fluctuation, it sends a warning to the administrator via communication channels. The input is an anomaly detection alert information, and the output is a warning message directed to the administrator. Through this process, the server communicates the situation to the administrator in real time.

[0748] Step 6:

[0749] The server visualizes the analyzed user's mental health information on a dashboard, allowing administrators to easily assess the situation. Input is sentiment analysis and scored data, while output is visually displayed mental health information. The system organizes and presents information in a way that is easy for administrators to understand and quickly grasp the situation.

[0750] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0751] This invention provides a system that analyzes emotions in real time using text and voice data collected from users. This system incorporates an emotion engine for emotion recognition and supports the management of the user's mental health.

[0752] Overview of the Embodiment

[0753] First, the device collects audio data from emails, chats, and recorded meetings. The data is sent to the server via a secure protocol. The server stores the received data and performs preprocessing. Preprocessing includes converting the audio data to text, denoising, and standardizing the data format.

[0754] In the sentiment analysis process, the server uses a sentiment engine, leveraging machine learning models and natural language processing techniques to extract emotional insights from the data. The sentiment engine can recognize new sentiment categories in addition to traditional ones. Furthermore, user profile information is taken into account to improve the accuracy of individual sentiment recognition.

[0755] The analysis results are stored in a database and used for tracking emotions over time and detecting anomaly patterns. The server also has the capability to predict future emotion trends by referring to past data history. If an anomaly is detected, the server generates an alert and sends a notification to HR personnel and managers. This notification is delivered via email or a dedicated notification system.

[0756] Specific example

[0757] For example, suppose an employee is experiencing high levels of stress due to an approaching project deadline. The terminal collects the employee's communication data and sends it to the server. The server's emotion engine analyzes this data and detects negative emotions such as stress and dissatisfaction at a high frequency. As a result, the emotion trend shows signs of worsening, and the server automatically sends an alert to the supervisor.

[0758] Supervisors receive notifications, check the dashboard, and view detailed analysis results. They can then take immediate action to resolve the problem. This strengthens the workplace environment and employee mental health care.

[0759] By implementing this invention, advanced emotional analysis using an emotion engine becomes possible, making early intervention in the workplace a reality.

[0760] The following describes the processing flow.

[0761] Step 1:

[0762] The device automatically collects the user's email, chat, and meeting audio data through a dedicated application. This data is encrypted and sent to the server using a secure protocol.

[0763] Step 2:

[0764] The server stores the received data in a database and performs preprocessing for analysis. Audio data is converted to text using speech recognition technology, and all data is formatted to a consistent style. Noise reduction and typographical error correction are also performed at this stage.

[0765] Step 3:

[0766] The server passes pre-processed text data to the sentiment engine. The sentiment engine uses machine learning models to identify and score the emotions in the data. In this process, data points are classified into multiple emotion categories.

[0767] Step 4:

[0768] The server tracks sentiment data scored by the sentiment engine over time. Anomaly detection algorithms are used to identify unusual sentiment patterns and sudden changes in sentiment to detect anomalies.

[0769] Step 5:

[0770] The server generates alerts based on detected anomaly patterns. These alerts consider individual user profile information and historical data to set tiered warning levels.

[0771] Step 6:

[0772] Alerts generated by the server are sent to administrators via email or a dedicated notification system. The alert notification includes a description of the anomaly along with recommended actions.

[0773] Step 7:

[0774] Users (administrators and HR personnel) can access the dashboard based on alerts to obtain detailed analysis results. This allows them to follow up with stakeholders before problems surface and determine appropriate mental health care measures.

[0775] (Example 2)

[0776] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0777] Conventional sentiment analysis systems have limited ability to accurately identify user emotions and quickly and automatically detect abnormal emotional patterns. Furthermore, analysis based on simple emotion categories struggles to capture the nuances of individual user emotions. In addition, the methods for notifying users when anomalies are detected are limited, posing a challenge in delivering timely and accurate alerts to relevant parties.

[0778] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0779] In this invention, the server includes means for receiving text data and audio data collected from users by an information processing device; means for performing speech recognition and noise reduction preprocessing on the text data and audio data and unifying the data format; and means for identifying emotions by applying machine learning models and natural language processing techniques to the unified data format. This enables the identification of diverse and precise emotion categories, rapid detection of abnormal emotion patterns, and immediate alert notifications to relevant parties.

[0780] An "information processing system" is a collective term for a set of hardware and software used to collect, store, process, and analyze data.

[0781] "User" refers to an individual or organization that provides data using this system.

[0782] "Text data" refers to document-format data generated or provided by the user, including email and chat content.

[0783] "Audio data" refers to data based on audio provided by the user, and includes recorded meeting audio and voice messages.

[0784] "Speech recognition" refers to the technology that analyzes audio data and converts it into text, making it possible to treat speech as textual information.

[0785] "Noise reduction" is the process of removing unwanted sounds and interfering background noise in audio data processing.

[0786] "Preprocessing" refers to a series of processes performed after data collection to prepare the data for analysis, and includes noise reduction and standardization of data formats.

[0787] "Unifying data formats" refers to the process of converting data in different formats into a common, parseable format.

[0788] A "machine learning model" is an algorithm or mathematical model used to learn from large amounts of data and automate specific tasks.

[0789] "Natural language processing technology" refers to a set of technologies that enable computers to understand, interpret, and generate human language.

[0790] "Identifying emotions" refers to the process of estimating and identifying a user's emotional state from text or audio data.

[0791] An "abnormal pattern" refers to a pattern that shows unnatural emotional changes or tendencies that differ from the user's normal emotional state.

[0792] "Electronic communications" refers to all technologies that send and receive information via digital devices, and includes email and dedicated messaging protocols.

[0793] This invention provides a system that analyzes a user's emotions in real time, detects emotional anomalies based on the results, and generates alerts. The invention can be implemented in the following forms.

[0794] The device collects text and audio data from the user. Common data collection devices such as email applications, chat software, and conference recording devices are used for this data collection. The collected data is securely transmitted to a server.

[0795] The server temporarily stores the received data. Audio data, in particular, is converted to text using speech recognition software such as Google Cloud Speech-to-Text or Microsoft Azure Speech Service. This process applies noise filtering algorithms to improve data quality. The data is then standardized into a format suitable for analysis.

[0796] During the analysis process, the server runs an emotion engine. This engine uses machine learning models (e.g., BERT and GPT) and natural language processing techniques to identify emotions from the integrated data. The emotion engine also utilizes generative AI models to recognize diverse emotion categories. Based on user profiles, it is possible to individually improve the accuracy of emotion recognition.

[0797] The results of the sentiment analysis are stored in a dedicated data storage system, and abnormal patterns are detected based on this data. When an abnormality is detected, an alert is automatically sent electronically to administrators and relevant organizational personnel.

[0798] For example, if a user makes a statement like, "I'm worried because I'm not making progress on my exam preparations," during a project, this is recorded in the database, and the server scores it as "anxiety." If this occurs frequently, an anomaly is detected in the emotional data, and an alert is sent to management.

[0799] This invention enables the extraction of emotional insights, which can be used to improve the work environment and support mental health care. An example of a prompt message would be, "A method to analyze a user's emotions and detect anomalies in relation to the pressure of project deadlines."

[0800] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0801] Step 1:

[0802] The terminal collects text and audio data from the user. Specific examples include messages sent by the user in a chat application and audio recorded by a conference recording device. At this stage, the input is raw communication data generated by the user, and the output is a data file appropriately formatted for further processing of this data.

[0803] Step 2:

[0804] The terminal sends the collected data to the server via a secure protocol. The data transmission uses a protocol that encrypts the data for secure transfer (e.g., TLS). The input is the formatted data collected in step 1, and the output is the decrypted raw data received on the server side.

[0805] Step 3:

[0806] The server converts the received audio data into text data. This process utilizes speech recognition software to convert spoken language into text. It also executes a noise filtering algorithm to remove background noise and improve quality. The input is an audio file, and the output is text data with noise removed and converted to characters.

[0807] Step 4:

[0808] The server uses the converted text data to standardize the data format and prepare it for subsequent processing. JSON or XML formats are used for data format standardization. The input is the text data generated in step 3, and the output is standardized, consistent data.

[0809] Step 5:

[0810] The server runs an emotion engine and analyzes the data using machine learning models and natural language processing techniques. It uses a generative AI model to identify positive, negative, and other diverse emotion categories. The input is the integrated data prepared in step 4, and the output is the analyzed data with emotion categories assigned to it.

[0811] Step 6:

[0812] The server stores the analyzed emotion data in a database and performs time-series emotion tracking and anomaly pattern detection. Anomaly pattern detection uses an algorithm that identifies changes that deviate from normal emotional states. The input is the analyzed data generated in step 5, and the output is time-series emotion data and the results of anomaly detection.

[0813] Step 7:

[0814] The server generates an alert when an anomaly is detected and notifies administrators and responsible personnel of the alert via electronic communication. Email or a dedicated messaging system is used for notification. The input is the emotion anomaly data identified in the processing up to step 6, and the output is the status of the alert notification's transmission completion.

[0815] (Application Example 2)

[0816] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0817] There is a challenge in appropriately monitoring the emotional state of the elderly, detecting abnormal emotional patterns early, and responding promptly. In particular, while effectively managing the mental health of the elderly is important in nursing homes and home care settings, current systems make it difficult to grasp emotional changes in real time and provide necessary care.

[0818] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0819] In this invention, the server includes means for receiving data collected from users, means for preprocessing the data and converting it into a unified format, means for analyzing the data in the unified format and identifying emotions, means for tracking the identified emotions in chronological order and detecting abnormal patterns, means for sending a warning to an administrator when an abnormal emotional pattern is detected, means for monitoring the emotional trends of elderly people in real time, and means for prompting caregivers of elderly people to respond based on emotional trends. This makes it possible to grasp the emotional state of elderly people in real time and respond quickly to abnormalities.

[0820] "Means for receiving data" refers to an interface for receiving information such as text and audio provided by the user into the server.

[0821] "Methods for pre-processing and converting to a unified format" refers to processing methods that perform noise reduction, text conversion, and other modifications on received data to transform it into a consistent data format.

[0822] "Means of identifying emotions" refer to technical methods for analyzing pre-processed data and recognizing specific emotional states.

[0823] "Methods for tracking over time and detecting abnormal patterns" refers to a function that observes changes in emotions over time and identifies unusual emotional movements.

[0824] A "means of sending warnings" refers to a system for communicating messages to administrators or caregivers about detected abnormal patterns to draw their attention.

[0825] "Methods for monitoring the emotional trends of the elderly in real time" refers to a process for collecting and continuously observing the emotional state of the elderly in real time.

[0826] "Methods to encourage caregivers to respond based on emotional trends" refer to systems that instruct relevant parties to carry out care activities that meet the needs of elderly individuals in response to changes in their emotional state.

[0827] The system implementing this invention mainly consists of a server and a terminal. The terminal is responsible for continuously recording conversations and monologues of elderly people and sending the audio data to the server via a secure protocol (e.g., HTTPS). The server uses the Google Cloud Speech-to-Text API to convert the audio data into text and performs preprocessing such as noise reduction and data format standardization.

[0828] For server-side sentiment analysis, software such as Python and scikit-learn are used, employing machine learning techniques to identify emotions from text data. Specifically, emotions are scored, and their changes are tracked over time. The identified emotion data is stored in a database, allowing for observation of emotional trends as time-series data.

[0829] When an abnormal emotional pattern is detected, the server immediately sends a warning message to the administrator or caregiver. Communication is conducted via email or a dedicated notification application, enabling a rapid response. This allows for real-time monitoring of trends in the emotional state of elderly individuals and prompting necessary interventions.

[0830] For example, if an elderly person's frequency of talking to themselves at night increases, caregivers can consider appropriate responses based on data such as the possibility that their daytime activity level has decreased. In this way, the objective of this invention is to effectively manage the mental health of the elderly and improve their quality of life.

[0831] Examples of prompts for a generative AI model are as follows:

[0832] "Analyze daily conversational audio data from elderly individuals to determine their daily emotional trends. In particular, detect signs of stress and anxiety and create a report."

[0833] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0834] Step 1:

[0835] The device records the daily conversations and monologues of elderly people. The audio data is the input and is temporarily stored as a digital audio file before being sent to the server. This data is collected in batches at specific time intervals.

[0836] Step 2:

[0837] The terminal sends the collected audio data to the server using a secure protocol. The input is the audio data from the terminal, and the output is the stream-formatted audio data received by the server. HTTPS is used as the communication protocol to ensure secure data transfer.

[0838] Step 3:

[0839] The server converts the received audio data into text data using the Google Cloud Speech-to-Text API. The input is audio data, and the output is text data. This text conversion prepares the data for natural language processing.

[0840] Step 4:

[0841] The server performs preprocessing on the text data, including noise reduction and data format standardization. The input is text data, and the output is de-noised and standardized text data. Specific operations include the removal of special characters and grammatical normalization.

[0842] Step 5:

[0843] The server performs sentiment analysis using preprocessed text data. It runs machine learning models using Python and scikit-learn. Input is text data in a unified format, and output is sentiment scores. A generative AI model is used to classify the data into specific sentiment categories.

[0844] Step 6:

[0845] The server stores the identified sentiments as time-series data in a database. The input is the sentiment score, and the output is the record in the time-series database. This makes it possible to track and evaluate sentiment trends later.

[0846] Step 7:

[0847] The server analyzes time-series data from the database to detect abnormal emotional patterns. The input is time-series data from the database, and the output is the detection results of abnormal patterns. These abnormal patterns trigger the next process.

[0848] Step 8:

[0849] When an abnormal pattern is detected, the server sends a warning message to administrators or caregivers. The input is the abnormal pattern, and the output is a warning message via email or notification application. This enables a rapid response tailored to the condition of the elderly person.

[0850] An example of a prompt message might be: "Analyze the daily conversational audio data of elderly people and determine their daily emotional trends. In particular, detect signs of stress and anxiety and create a report."

[0851] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0852] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0853] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0854] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0855] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0856] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0857] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0858] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0859] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0860] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0861] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0862] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0863] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0864] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0865] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0866] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0867] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0868] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0869] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0870] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0871] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0872] The following is further disclosed regarding the embodiments described above.

[0873] (Claim 1)

[0874] A means for receiving text data and audio data collected from users,

[0875] A means for preprocessing the aforementioned text data and audio data and converting them into a unified format,

[0876] A means for analyzing the aforementioned standardized data and identifying emotions,

[0877] A means for tracking identified emotions over time and detecting abnormal patterns,

[0878] A means of sending an alert to the administrator when an abnormal emotional pattern is detected,

[0879] A system that includes this.

[0880] (Claim 2)

[0881] The system according to claim 1, characterized in that the means for identifying the emotion uses a machine learning model to score the emotion.

[0882] (Claim 3)

[0883] The system according to claim 1, characterized in that the means for transmitting the alert distributes the alert via email and a notification system.

[0884] "Example 1"

[0885] (Claim 1)

[0886] A means of receiving information obtained from the user,

[0887] Means for preprocessing the aforementioned information and converting it into a standardized format,

[0888] A means for interpreting the standardized information and detecting emotions,

[0889] A means of monitoring detected emotions over time to identify abnormal situations,

[0890] A means of notifying the administrator if an abnormal emotional state is detected,

[0891] A system that includes this.

[0892] (Claim 2)

[0893] The system according to claim 1, characterized in that the means for detecting emotion evaluates emotion using a machine learning model.

[0894] (Claim 3)

[0895] The system according to claim 1, characterized in that the means for sending the notification delivers the notification via electronic communications and notification services.

[0896] "Application Example 1"

[0897] (Claim 1)

[0898] Means for receiving message data and record data collected from users,

[0899] A means for preprocessing the message data and record data and converting them into a unified format,

[0900] A means for analyzing the aforementioned standardized data and identifying and scoring emotions,

[0901] A means to track identified emotions over time and detect abnormal fluctuations,

[0902] A means of sending a warning to the administrator via a communication means when abnormal emotional fluctuations are detected,

[0903] A means of visualizing users' mental health information on a dashboard,

[0904] A system that includes this.

[0905] (Claim 2)

[0906] The system according to claim 1, characterized in that the means for identifying and scoring the aforementioned emotions is performed using a generative AI model.

[0907] (Claim 3)

[0908] The system according to claim 1, characterized in that the means for transmitting the warning distributes the warning using a computer network.

[0909] "Example 2 of combining an emotion engine"

[0910] (Claim 1)

[0911] A means for receiving text data and audio data collected from a user by an information processing device,

[0912] A means for performing preprocessing on the aforementioned text data and audio data, including speech recognition and noise reduction, and for unifying the data format,

[0913] A means for identifying emotions by applying machine learning models and natural language processing techniques to the aforementioned standardized data,

[0914] A means for tracking identified emotions over time and detecting abnormal patterns,

[0915] A means of sending an alert to the administrator via electronic communication when an abnormal emotional pattern is detected,

[0916] A system that includes this.

[0917] (Claim 2)

[0918] The system according to claim 1, wherein the means for identifying emotions expands the categories of emotions using a generative AI model and scores them.

[0919] (Claim 3)

[0920] The system according to claim 1, characterized in that the means for transmitting the alert distributes the alert through electronic communication means and dedicated communication means.

[0921] "Application example 2 when combining with an emotional engine"

[0922] (Claim 1)

[0923] A means for receiving data collected from users,

[0924] A means for preprocessing the aforementioned data and converting it into a unified format,

[0925] A means for analyzing the aforementioned standardized data and identifying emotions,

[0926] A means for tracking identified emotions over time and detecting abnormal patterns,

[0927] A means of sending a warning to the administrator when an abnormal emotional pattern is detected,

[0928] A means of monitoring the emotional trends of the elderly in real time,

[0929] Means to encourage caregivers for the elderly to respond based on emotional trends,

[0930] A system that includes this.

[0931] (Claim 2)

[0932] The system according to claim 1, wherein the means for identifying emotions is characterized by evaluating emotions using a learning model.

[0933] (Claim 3)

[0934] The system according to claim 1, characterized in that the means for transmitting the warning distributes the warning through a communication means. [Explanation of Symbols]

[0935] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. Means for receiving message data and record data collected from users, A means for preprocessing the message data and record data and converting them into a unified format, A means for analyzing the aforementioned standardized data and identifying and scoring emotions, A means to track identified emotions over time and detect abnormal fluctuations, A means of sending a warning to the administrator via a communication means when abnormal emotional fluctuations are detected, A means of visualizing users' mental health information on a dashboard, A system that includes this.

2. The system according to claim 1, characterized in that the means for identifying and scoring the aforementioned emotions is performed using a generative AI model.

3. The system according to claim 1, characterized in that the means for transmitting the warning distributes the warning using a computer network.