system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A system using voice analysis and generative AI for real-time mental health assessment and feedback addresses the limitations of conventional mental care, enhancing employee well-being and productivity.

JP2026096608APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

Smart Images

Figure 2026096608000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] A voice analysis method that analyzes voice data and generates indicators of emotion and stress from its tone, speed, intonation, etc. Generative artificial intelligence means, A mental health assessment tool that evaluates the user's mental health status, such as stress levels and happiness, based on the results of voice analysis and generative artificial intelligence, A means of providing feedback based on the analysis results, A system for managing the mental health of employees, including means of continuous follow-up.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] The deterioration of employees' mental health status may lead to a decline in productivity, an increase in turnover rate, and further legal risks. However, conventional stress checks and mental care by experts have low utilization rates and limited effects. An object of this invention is to provide an effective system for early detection and prevention of mental disorders.

Means for Solving the Problems

[0005] This invention analyzes employees' voice data using voice analysis means and evaluates their mental health using generative artificial intelligence means. Furthermore, it builds a continuous system that provides real-time feedback based on the analysis results and provides the necessary care and follow-up to maintain mental health. As a result, employees can continuously manage their own mental state and effectively maintain and improve their mental health.

[0006] "Voice analysis means" refers to technology that analyzes voice data and generates indicators of emotion and stress from its tone, speed, intonation, etc.

[0007] "Generative artificial intelligence means" refers to artificial intelligence technology used to learn knowledge of psychology and counseling, and to quantitatively evaluate mental health status from the analysis results.

[0008] "Mental health assessment tools" refer to a process that evaluates a user's mental health status, such as stress levels and happiness levels, based on the results of voice analysis and generated artificial intelligence.

[0009] "Means of providing feedback based on analysis results" refers to methods or technologies for providing users with necessary care and action guidelines in real time based on evaluation results obtained from voice analysis.

[0010] "Continuous follow-up measures" refer to methods or systems for analyzing a user's past health data and periodically suggesting further care or checking the progress of health management as needed. [Brief explanation of the drawing]

[0011] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0012] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0013] First, let's explain the terminology used in the following explanation.

[0014] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0015] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0016] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0017] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0018] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0019] [First Embodiment]

[0020] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0021] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0022] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0023] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0024] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0025] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0026] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0027] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0028] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0029] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0030] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0031] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0032] This invention is a system for managing and providing care for the mental health of employees, and has the following configuration: The system consists of three elements: a server, a terminal, and a user.

[0033] The server plays a crucial role in analyzing and evaluating received audio data using voice analysis and generative artificial intelligence (AI) means. Audio data is transmitted from terminals, and the server processes it using a voice analysis engine. The voice analysis engine analyzes the tone, speed, and intonation of the voice, quantifying the employee's emotional state and stress level. Next, the generative AI means performs a mental health assessment based on these analysis results. The AI considers the employee's usage history and past data, enabling assessments tailored to individual circumstances.

[0034] The device records the user's voice data and securely transmits it to a server. It also displays the evaluation results received from the server on the user interface, providing feedback to the user. This feedback includes suggestions and actions for care tailored to the user's mental state. The device strives for a simple and easy-to-understand display, enabling users to easily implement the recommended self-care.

[0035] Users record their daily mental state by using voice input to describe their emotions and daily circumstances into the device, which is then sent to a server. This system allows users to consciously monitor their own mental health. Based on the feedback received from the device, users then take appropriate self-care measures to maintain their mental well-being.

[0036] To give a specific example, when a user feels stressed during a day's work, they can speak their feelings into the device. The server analyzes the audio, and the AI evaluates the stress level. After checking the evaluation results on the device, the user can regain peace of mind by performing recommended relaxation exercises as instructed by the device.

[0037] Thus, this system utilizes voice analysis technology and artificial intelligence to provide a way for users to continuously manage their mental health and take necessary self-care actions in a timely manner.

[0038] The following describes the processing flow.

[0039] Step 1:

[0040] The user inputs their daily emotions and experiences into the device via voice input. The device starts recording and acquires the user's speech as digital audio data.

[0041] Step 2:

[0042] The device converts the recorded audio data into an appropriate format and encrypts the data to ensure security. It then sends the prepared audio data to the server.

[0043] Step 3:

[0044] The server decodes the received audio data and passes it to the audio analysis engine. The analysis engine analyzes the tone, speed, intonation, etc., of the voice and generates indicators of emotion and stress.

[0045] Step 4:

[0046] The server's AI generation method evaluates employees' mental health based on the results of voice analysis. The AI module utilizes psychological knowledge to quantify stress levels, happiness levels, and other factors.

[0047] Step 5:

[0048] Based on the results of the mental health assessment, the server determines appropriate care and behavioral guidelines for the user. This includes recommendations for relaxation exercises and consultations with professionals.

[0049] Step 6:

[0050] The terminal receives feedback from the server and displays it in a format that is easy for the user to understand. The terminal also presents the user with any necessary guides or additional information.

[0051] Step 7:

[0052] The user follows instructions from the device and performs recommended self-care actions, such as engaging in stress-relieving exercises.

[0053] Step 8:

[0054] As part of ongoing follow-up, the server periodically analyzes the user's past data and, if necessary, proposes various care and follow-up options.

[0055] (Example 1)

[0056] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0057] In today's workplace, it is crucial to properly manage employees' mental health and provide effective care. However, traditional methods make it difficult to quickly and accurately assess individual employees' situations and emotional states, and thus hinder the provision of appropriate feedback and self-care recommendations.

[0058] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0059] In this invention, the server includes voice analysis means, generated data processing means, and health status evaluation means. This makes it possible to analyze the voice data of individual employees and quickly and accurately evaluate their mental health status.

[0060] "Voice analysis means" refers to elements used to extract characteristics from voice data and generate indicators of emotion and stress.

[0061] "Generated data processing means" refers to elements for performing individual evaluations based on analyzed data and generating appropriate feedback.

[0062] "Health status assessment means" refers to elements for evaluating the mental health status of employees, taking into account analyzed voice data and past history.

[0063] "Means of providing information based on analysis results" refers to elements that utilize the results obtained from voice analysis and health status evaluation to provide users with specific feedback and self-care recommendations.

[0064] "Continuous monitoring measures" are elements that support long-term mental health by analyzing the user's past data and suggesting further care methods as needed.

[0065] This invention is a system for managing the mental health of employees and providing appropriate care. Specifically, it consists of three elements: a server, a terminal, and a user.

[0066] The server plays a crucial role in processing the audio data received from the user using speech analysis tools. During this process, it analyzes the characteristics of the speech using speech analysis engines such as Google® Cloud Speech-to-Text. This analysis quantifies characteristics such as tone, speed, and intonation. Subsequently, using a generated data processing tool, the data is analyzed using a generated AI model (e.g., AI software with general natural language processing technology) to assess the user's mental health. This process also utilizes past data history, allowing for assessments tailored to individual circumstances.

[0067] The device provides an interface for recording the user's voice data and securely transmitting it to a server. The recorded voice data is encrypted and transmitted via a secure protocol such as HTTPS. Furthermore, when feedback is received from the server, the device displays the results on the user interface. Here, relaxation exercises or guidance videos are displayed, making it easy for the user to understand and perform them.

[0068] This system allows users to input their emotions and daily situations into a device using voice. For example, if a user says to the device, "I felt a little stressed at work today," a prompt will appear saying, "Please tell us how you are feeling right now so we can assess your stress level." The server then starts the analysis and sends the results to the device.

[0069] This system allows users to consciously manage their own mental health and receive appropriate self-care.

[0070] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0071] Step 1:

[0072] The user provides voice input to the device about their emotions and everyday situations. The input voice data is recorded by the device. Specifically, the device's microphone captures the voice and converts it into a digital audio file. This audio file becomes the input data for subsequent processing.

[0073] Step 2:

[0074] The device sends the recorded audio data to the server using a secure network protocol (e.g., HTTPS). The data is encrypted, preventing unauthorized access by third parties. The server receives the transmitted data and uses it as input for audio analysis.

[0075] Step 3:

[0076] The server analyzes the received audio data using speech analysis tools. This process analyzes the tone, speed, and intonation of the speech. Specifically, the speech analysis engine uses digital signal processing technology to quantify the audio data and generate indicators of emotion and stress. This provides the output as the analysis result.

[0077] Step 4:

[0078] The server uses a data generation processing system and applies a generation AI model to evaluate mental health based on the analysis results. Past data history is also considered at this stage. The output is an individual employee mental health assessment. This assessment serves as the basis for generating feedback.

[0079] Step 5:

[0080] The server uses a feedback provision mechanism based on the analysis results to generate feedback tailored to each employee. Specifically, it utilizes a generation AI model to create feedback that includes self-care methods and relaxation actions according to the user's condition. This feedback is then prepared as output to the terminal.

[0081] Step 6:

[0082] The device receives feedback from the server and displays it in the user interface. Specifically, the feedback content is visually displayed on the device's screen, and a notification function informs the user of the arrival of feedback as needed. The user receives this as output and confirms the actions that should be taken.

[0083] Step 7:

[0084] Based on the feedback displayed on the device, the user performs suggested self-care actions. For example, they might take action to maintain their mental health by performing relaxation exercises. At this stage, the feedback acts as direct input, and the user's actions are the output.

[0085] (Application Example 1)

[0086] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0087] There is a need for a system that can effectively manage and continuously follow up on the mental health of employees. In particular, there is a lack of convenient ways to assess the mental health of users in their daily lives and provide appropriate care. To address this problem, it is necessary to provide a system that uses voice analysis technology and artificial intelligence to evaluate users' emotional states and stress levels and provide individualized support.

[0088] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0089] In this invention, the server includes voice analysis means, generative artificial intelligence means, and mental health assessment means. This makes it possible to analyze received voice data and quantitatively evaluate emotional state and stress levels. Furthermore, the user device can visually or audibly present appropriate feedback and care suggestions to the user based on the analysis results. This enables the user to consciously manage their mental health in their daily life.

[0090] "Voice analysis means" refers to a device or method that receives voice data, analyzes it, and generates indicators such as emotional state and stress level.

[0091] "Generative artificial intelligence means" refers to artificial intelligence technology that evaluates the user's mental health based on the analysis results of received audio data, and extracts and presents the analysis results.

[0092] A "mental health assessment method" is a process or device that quantitatively evaluates a user's mental health status using data obtained from voice analysis.

[0093] A "feedback provision method" is a method or device for presenting improvement suggestions or care content to users based on analysis results and assessments of their mental health status.

[0094] "Continuous follow-up measures" refer to functions or means that continuously analyze past mental health data, make adjustments as needed, and support the user's long-term mental health.

[0095] "User device" refers to a device or means that allows the user to receive analysis results and obtain instructions or suggestions visually or audibly.

[0096] This invention is a system for managing and providing care for the mental health of employees. The system mainly consists of a server, terminals, and users.

[0097] First, the process begins with the user voice-inputting their emotions and stress levels into the device. The device then records this voice data and transmits it to a server via a secure protocol. The hardware used includes a highly sensitive microphone and a communication module for data transmission.

[0098] The server processes the received audio data using a speech analysis engine (e.g., Google Cloud Speech-to-Text or Amazon Transcribe) to analyze the tone, speed, and intonation of the voice. The analysis results are then used with a generative AI model to quantify the user's emotional state and stress level. The generative AI model used here learns from past data, enabling more accurate assessments.

[0099] The server then uses artificial intelligence generation to perform a mental health assessment based on the analysis results. The assessment results are sent to a terminal, which provides visual feedback through a user interface. This feedback includes suggestions for self-care tailored to the user's mental state and instructions for relaxation exercises. The terminal is equipped with a visual display device and an audio output device to facilitate the user's implementation of the recommended care.

[0100] For example, if a user feels stressed at work, they might say to their device, "I'm feeling a little stressed today, can you tell me how to relax?" Based on this prompt, the server performs an analysis and conducts a mental health assessment. It then provides feedback, such as suggesting and playing appropriate relaxation music or displaying a guide to a series of yoga poses. In this way, users can easily manage their mental health in their daily lives.

[0101] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0102] Step 1:

[0103] The user provides voice input to the device regarding their emotions and stress levels. This voice input data is captured by the device's high-sensitivity microphone. The device converts the acquired voice data into a digital format and temporarily stores it for use in the next step.

[0104] Step 2:

[0105] The terminal transmits voice data to the server via a secure protocol. This process uses encrypted communication such as SSL / TLS to ensure data security. The transmitted voice data is then used as input for the voice analysis process on the server.

[0106] Step 3:

[0107] The server processes the received audio data using a speech analysis engine (e.g., Google Cloud Speech-to-Text). This analysis examines the tone, speed, and intonation of the speech. As part of the data processing, these parameters are quantified to generate indices indicating emotional state and stress levels. The output of this step is a quantified evaluation index.

[0108] Step 4:

[0109] The server uses generative artificial intelligence to evaluate the user's mental health based on evaluation metrics obtained from voice analysis. In this step, the generative AI model calculates the evaluation results while considering past health data. The output is a detailed evaluation result regarding the user's mental health.

[0110] Step 5:

[0111] The server sends the assessment results to the terminal. These results include feedback for the user. As part of providing feedback, it is presented as audio or visual data to help the user easily understand the content and take recommended self-care actions. The terminal displays the assessment results visually and provides audio instructions as needed.

[0112] Step 6:

[0113] Users perform relaxation exercises based on feedback received from their devices. Through these specific actions, users aim to reduce their stress and regain peace of mind. Examples of this feedback include playing videos guiding users through yoga poses.

[0114] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0115] The present invention provides a system for precisely managing the mental health of employees, which includes voice analysis means, generative artificial intelligence means, mental health assessment means, feedback provision means based on analysis results, and continuous follow-up means, and further provides a function to recognize the user's emotions by combining it with an emotion engine.

[0116] In this system, the server performs crucial processing. When a user inputs their emotions or daily events via voice into a terminal, the terminal records the audio and sends it to the server as digital data. The server analyzes the received audio data using voice analysis tools and generates quantitative indicators based on the tone, speed, and intonation of the emotion. During this analysis process, an emotion engine functions to recognize the emotional state in the voice with high accuracy. The emotion engine uses machine learning algorithms to analyze the user's emotions and reflects the recognition results in a mental health assessment.

[0117] The generative artificial intelligence system uses data from an emotion engine to comprehensively assess the user's mental state. This assessment includes indicators such as stress levels, happiness, and fatigue, clearly indicating the employee's current mental state. The assessment results are used to generate clear, actionable feedback, which the server then sends to the terminal.

[0118] The device displays feedback received from the server to the user in an easy-to-understand format. The user can then review this feedback and implement recommended self-care actions. For example, if the emotional engine detects high stress levels, the device provides the user with guidance on relaxation exercises.

[0119] When users take action based on feedback, the results are also recorded in the system. The server uses continuous follow-up to track the user's mental health over the long term and, if necessary, offers further care suggestions to the user. In this way, by incorporating an emotion engine, the system provides a more accurate and effective way to manage the user's mental health.

[0120] The following describes the processing flow.

[0121] Step 1:

[0122] The user inputs voice information about their emotions and current state into the device. The device then records this voice and prepares to save it as digital data.

[0123] Step 2:

[0124] The terminal converts the recorded audio data into a predetermined format (e.g., WAV or MP3) and encrypts it for secure transmission to the server. It then executes the transmission procedure.

[0125] Step 3:

[0126] The server decodes the received audio data and analyzes it using an audio analysis tool. The audio analysis is based on factors such as tone, speed, and intonation, and extracts basic emotional information.

[0127] Step 4:

[0128] The emotion engine within the server functions to recognize emotions with high accuracy from the analyzed audio data. The emotion engine utilizes machine learning algorithms to pass the obtained emotion data to subsequent processing.

[0129] Step 5:

[0130] The server uses generative artificial intelligence to comprehensively assess mental health. This process utilizes data from an emotion engine to generate indicators such as stress levels and well-being.

[0131] Step 6:

[0132] The server generates feedback for the user based on the evaluation results. This feedback includes recommended actions and self-care suggestions.

[0133] Step 7:

[0134] Feedback is sent from the server to the terminal, which then notifies and displays it to the user. The feedback is provided as specific guidelines that the user can immediately implement.

[0135] Step 8:

[0136] Users follow instructions from the device and perform suggested self-care activities. For example, they might engage in activities such as meditation or stretching to maintain their mental health.

[0137] Step 9:

[0138] The server periodically analyzes the user's past health data and its results through continuous follow-up mechanisms. Additional feedback and support can be provided as needed.

[0139] (Example 2)

[0140] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0141] In modern society, accurately and effectively managing employees' mental health is a challenge. Traditional systems have struggled to quickly detect emotional changes and stress levels and provide appropriate feedback. Furthermore, long-term mental health follow-up has been insufficient, making it difficult to sustainably maintain employees' mental well-being.

[0142] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0143] In this invention, the server includes an input means for receiving voice data, an analysis means for recognizing emotional states using voice analysis means, an evaluation means for evaluating mental health using generative artificial intelligence, a providing means for generating and providing feedback based on the evaluation results, and a follow-up means for tracking user behavior and monitoring mental health over the long term. This enables high-precision analysis of employees' emotions and stress levels, prompt provision of appropriate feedback, and long-term mental health management.

[0144] An "input device" is an interface for a user to provide voice data to a system, and is a device that converts voice into digital data.

[0145] "Analysis means" refers to a device or program that processes audio data within a server and analyzes the tone, speed, and intonation of the voice in order to identify the emotional state.

[0146] The "evaluation method" refers to a function within the system that uses generative artificial intelligence to comprehensively evaluate the user's mental health based on the analysis results.

[0147] "Means of providing" refers to a function or device that generates and provides appropriate feedback to the user based on the results from the evaluation means.

[0148] A "follow-up tool" is a function or system that tracks changes in a user's behavior and mental health over the long term and proposes necessary care again.

[0149] This invention is a system for managing the mental health of employees, implemented through server, terminal, and user interaction. Specifically, it effectively grasps the user's mental state and provides appropriate feedback and care by analyzing voice data using an emotion engine and performing mental health assessments utilizing a generative AI model.

[0150] The user first uses voice input into the device. The device has a recording function that records the user's voice as digital data. The recorded data is then transmitted to the server via a secure communication protocol.

[0151] The server processes the received audio data using speech analysis software. Here, the emotion engine operates, analyzing the tone, speed, and intonation of the speech to quantify the user's emotional state. This emotional data is then analyzed within a generative AI model and reflected in an assessment of mental health. The AI model provides indicators such as stress levels, happiness, and fatigue to evaluate the user's current mental state.

[0152] Based on the evaluation results, the server generates specific feedback and sends it to the terminal. The terminal displays this feedback visually, presenting it in a user-friendly format. For example, if high stress levels are detected, the terminal provides the user with a guide to relaxation exercises.

[0153] Furthermore, the system records how users respond to feedback, and the server stores the results in a database as a means of continuous follow-up. This enables long-term monitoring of the user's mental health and allows for the re-suggestion of necessary care. For example, a prompt such as, "Please tell us about a stressful experience you've had recently, and what actions did you take as a result?" is used.

[0154] This system makes it possible to accurately assess the mental health of employees and implement effective care.

[0155] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0156] Step 1:

[0157] The user inputs everyday events and emotions into the device via voice. The device records the input voice and converts it into digital data. This data is then sent to the server as input data.

[0158] Step 2:

[0159] The server processes the received audio data using audio analysis tools. Specifically, it decodes the audio and uses analysis software to extract tone, speed, and intonation. As a result, quantified emotion data is generated. This data is output as a model of the emotional state.

[0160] Step 3:

[0161] Using an emotional state model, the server evaluates the user's mental health within a generated AI model. The AI model takes emotional data as input and calculates indicators of mental state such as stress level, happiness, and fatigue. As a result, a comprehensive mental state evaluation result is generated. This result is output as evaluation data.

[0162] Step 4:

[0163] Based on evaluation data, the server uses feedback provisioning mechanisms to generate specific feedback for the user. This feedback is generated as text or audio guidance and sent to the device. As a result, clear and actionable feedback is output.

[0164] Step 5:

[0165] The device visually displays the feedback received from the server and informs the user. The user reviews the feedback and takes the recommended action. During this process, the device records the user's reactions and behavioral data. As a result, an action log is output.

[0166] Step 6:

[0167] The server uses behavioral logs as a means of continuous follow-up to track the user's mental health over the long term. Based on this data, it suggests further care to the user as needed. As a result, follow-up data is output.

[0168] (Application Example 2)

[0169] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0170] In modern society, particularly in caregiving settings, effectively managing the mental health of both caregivers and clients is crucial. However, traditional methods often overlook signs of emotions and stress, making appropriate responses difficult. Therefore, there is a need for a system that can more accurately and comprehensively assess psychological states and provide appropriate care methods.

[0171] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0172] In this invention, the server includes voice conversion means, machine intelligence means, and psychological health assessment means. This enables accurate understanding of the psychological state of caregivers and users, and allows for appropriate care and feedback based on that understanding.

[0173] A "speech conversion method" is a means for analyzing speech data and detecting its characteristics, speed, and intonation.

[0174] "Machine intelligence means" refers to a method for evaluating psychological states using artificial intelligence technology based on information obtained from voice data.

[0175] A "psychological health assessment tool" is a method for quantifying and evaluating mental health based on data analyzed by machine intelligence.

[0176] "Information provision methods based on analysis results" refer to means of presenting appropriate care methods to users based on the results obtained from psychological health assessment methods.

[0177] "Continuous tracking measures" refer to methods for recording and analyzing the user's psychological state over a long period and providing additional support as needed.

[0178] This system accurately manages the mental health of caregivers and users in care settings and provides appropriate care. The system primarily consists of voice conversion, machine intelligence, psychological health assessment, information provision based on analysis results, and continuous tracking.

[0179] The server receives conversations between caregivers and users as digital audio data using a speech conversion system. The received audio data is analyzed for characteristics, speed, and intonation through the speech conversion system. Next, a machine intelligence system uses the analyzed data to evaluate the psychological state. This process utilizes software called an emotion engine to identify signs of emotion and stress in the audio with high accuracy.

[0180] The psychological health assessment tool quantitatively evaluates the stress levels and well-being of caregivers and users based on the results of emotion analysis obtained by machine intelligence. The server transmits the assessment results to the caregiver using an information provision tool based on the analysis results. This feedback is presented to the caregiver as specific care methods and relaxation guides.

[0181] The continuous tracking system is designed to record the psychological health data of users and caregivers over the long term, enabling follow-up support and new care suggestions as needed. The server continuously collects and analyzes data to provide care methods that adapt to changes in the user's psychological state.

[0182] As a concrete example, if a voice conversion device detects an increase in stress in a user's conversation, the information provision device will provide feedback to the caregiver such as, "Please suggest activities that will help the user relax."

[0183] An example of a prompt to input into the generating AI model is: "Evaluate stress and well-being from the voice data and generate specific feedback for the caregiver."

[0184] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0185] Step 1:

[0186] The device records conversations between the caregiver and the user. It acquires the audio data and sends it to the server in a digital format. In this case, the input is an audio signal, and the output is digital audio data.

[0187] Step 2:

[0188] The server analyzes the received audio data using a speech conversion device. It analyzes characteristics, speed, and intonation to generate initial emotional indicators. The input is digital audio data, and the output is analyzed data including emotional indicators.

[0189] Step 3:

[0190] The server uses machine intelligence to analyze the speech and evaluate the user's psychological state based on emotional indicators. An emotion engine is used to quantify stress levels and happiness. The input is emotional indicators, and the output is evaluation data indicating the psychological state.

[0191] Step 4:

[0192] The server generates feedback based on psychological state assessment data and analysis results using information provision methods. It constructs feedback that includes specific care methods and guidelines. The input is psychological state assessment data, and the output is a feedback message.

[0193] Step 5:

[0194] The terminal displays feedback messages received from the server to the caregiver. Based on the feedback, it suggests care methods and influences the caregiver's actions. The input is the feedback message, and the output is the caregiver's implementation of care.

[0195] Step 6:

[0196] The server uses continuous tracking to record and analyze long-term psychological health data of caregivers and users. It provides additional care suggestions as needed. The input is newly collected psychological data, and the output is updated care suggestions.

[0197] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0198] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0199] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0200] [Second Embodiment]

[0201] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0202] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0203] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0204] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0205] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0206] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0207] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0208] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0209] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0210] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0211] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0212] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0213] This invention is a system for managing and providing care for the mental health of employees, and has the following configuration: The system consists of three elements: a server, a terminal, and a user.

[0214] The server plays a crucial role in analyzing and evaluating received audio data using voice analysis and generative artificial intelligence (AI) means. Audio data is transmitted from terminals, and the server processes it using a voice analysis engine. The voice analysis engine analyzes the tone, speed, and intonation of the voice, quantifying the employee's emotional state and stress level. Next, the generative AI means performs a mental health assessment based on these analysis results. The AI considers the employee's usage history and past data, enabling assessments tailored to individual circumstances.

[0215] The device records the user's voice data and securely transmits it to a server. It also displays the evaluation results received from the server on the user interface, providing feedback to the user. This feedback includes suggestions and actions for care tailored to the user's mental state. The device strives for a simple and easy-to-understand display, enabling users to easily implement the recommended self-care.

[0216] Users record their daily mental state by using voice input to describe their emotions and daily circumstances into the device, which is then sent to a server. This system allows users to consciously monitor their own mental health. Based on the feedback received from the device, users then take appropriate self-care measures to maintain their mental well-being.

[0217] To give a specific example, when a user feels stressed during a day's work, they can speak their feelings into the device. The server analyzes the audio, and the AI evaluates the stress level. After checking the evaluation results on the device, the user can regain peace of mind by performing recommended relaxation exercises as instructed by the device.

[0218] Thus, this system utilizes voice analysis technology and artificial intelligence to provide a way for users to continuously manage their mental health and take necessary self-care actions in a timely manner.

[0219] The following describes the processing flow.

[0220] Step 1:

[0221] The user inputs their daily emotions and experiences into the device via voice input. The device starts recording and acquires the user's speech as digital audio data.

[0222] Step 2:

[0223] The device converts the recorded audio data into an appropriate format and encrypts the data to ensure security. It then sends the prepared audio data to the server.

[0224] Step 3:

[0225] The server decodes the received audio data and passes it to the audio analysis engine. The analysis engine analyzes the tone, speed, intonation, etc., of the voice and generates indicators of emotion and stress.

[0226] Step 4:

[0227] The server's AI generation method evaluates employees' mental health based on the results of voice analysis. The AI module utilizes psychological knowledge to quantify stress levels, happiness levels, and other factors.

[0228] Step 5:

[0229] Based on the results of the mental health assessment, the server determines appropriate care and behavioral guidelines for the user. This includes recommendations for relaxation exercises and consultations with professionals.

[0230] Step 6:

[0231] The terminal receives feedback from the server and displays it in a format that is easy for the user to understand. The terminal also presents the user with any necessary guides or additional information.

[0232] Step 7:

[0233] The user follows instructions from the device and performs recommended self-care actions, such as engaging in stress-relieving exercises.

[0234] Step 8:

[0235] As part of ongoing follow-up, the server periodically analyzes the user's past data and, if necessary, proposes various care and follow-up options.

[0236] (Example 1)

[0237] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0238] In today's workplace, it is crucial to properly manage employees' mental health and provide effective care. However, traditional methods make it difficult to quickly and accurately assess individual employees' situations and emotional states, and thus hinder the provision of appropriate feedback and self-care recommendations.

[0239] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0240] In this invention, the server includes voice analysis means, generated data processing means, and health status evaluation means. This makes it possible to analyze the voice data of individual employees and quickly and accurately evaluate their mental health status.

[0241] "Voice analysis means" refers to elements used to extract characteristics from voice data and generate indicators of emotion and stress.

[0242] "Generated data processing means" refers to elements for performing individual evaluations based on analyzed data and generating appropriate feedback.

[0243] "Health status assessment means" refers to elements for evaluating the mental health status of employees, taking into account analyzed voice data and past history.

[0244] "Means of providing information based on analysis results" refers to elements that utilize the results obtained from voice analysis and health status evaluation to provide users with specific feedback and self-care recommendations.

[0245] "Continuous monitoring measures" are elements that support long-term mental health by analyzing the user's past data and suggesting further care methods as needed.

[0246] This invention is a system for managing the mental health of employees and providing appropriate care. Specifically, it consists of three elements: a server, a terminal, and a user.

[0247] The server plays a crucial role in processing the audio data received from the user using speech analysis tools. During this process, it analyzes the characteristics of the speech using speech analysis engines such as Google Cloud Speech-to-Text. This analysis quantifies characteristics such as tone, speed, and intonation. Subsequently, using generative data processing tools, the data is analyzed using a generative AI model (e.g., AI software with general natural language processing technology) to assess the user's mental health. This process also utilizes past data history, allowing for assessments tailored to individual circumstances.

[0248] The device provides an interface for recording the user's voice data and securely transmitting it to a server. The recorded voice data is encrypted and transmitted via a secure protocol such as HTTPS. Furthermore, when feedback is received from the server, the device displays the results on the user interface. Here, relaxation exercises or guidance videos are displayed, making it easy for the user to understand and perform them.

[0249] This system allows users to input their emotions and daily situations into a device using voice. For example, if a user says to the device, "I felt a little stressed at work today," a prompt will appear saying, "Please tell us how you are feeling right now so we can assess your stress level." The server then starts the analysis and sends the results to the device.

[0250] This system allows users to consciously manage their own mental health and receive appropriate self-care.

[0251] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0252] Step 1:

[0253] The user provides voice input to the device about their emotions and everyday situations. The input voice data is recorded by the device. Specifically, the device's microphone captures the voice and converts it into a digital audio file. This audio file becomes the input data for subsequent processing.

[0254] Step 2:

[0255] The device sends the recorded audio data to the server using a secure network protocol (e.g., HTTPS). The data is encrypted, preventing unauthorized access by third parties. The server receives the transmitted data and uses it as input for audio analysis.

[0256] Step 3:

[0257] The server analyzes the received audio data using speech analysis tools. This process analyzes the tone, speed, and intonation of the speech. Specifically, the speech analysis engine uses digital signal processing technology to quantify the audio data and generate indicators of emotion and stress. This provides the output as the analysis result.

[0258] Step 4:

[0259] The server uses a data generation processing system and applies a generation AI model to evaluate mental health based on the analysis results. Past data history is also considered at this stage. The output is an individual employee mental health assessment. This assessment serves as the basis for generating feedback.

[0260] Step 5:

[0261] The server uses a feedback provision mechanism based on the analysis results to generate feedback tailored to each employee. Specifically, it utilizes a generation AI model to create feedback that includes self-care methods and relaxation actions according to the user's condition. This feedback is then prepared as output to the terminal.

[0262] Step 6:

[0263] The device receives feedback from the server and displays it in the user interface. Specifically, the feedback content is visually displayed on the device's screen, and a notification function informs the user of the arrival of feedback as needed. The user receives this as output and confirms the actions that should be taken.

[0264] Step 7:

[0265] Based on the feedback displayed on the device, the user performs suggested self-care actions. For example, they might take action to maintain their mental health by performing relaxation exercises. At this stage, the feedback acts as direct input, and the user's actions are the output.

[0266] (Application Example 1)

[0267] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0268] There is a need for a system that can effectively manage and continuously follow up on the mental health of employees. In particular, there is a lack of convenient ways to assess the mental health of users in their daily lives and provide appropriate care. To address this problem, it is necessary to provide a system that uses voice analysis technology and artificial intelligence to evaluate users' emotional states and stress levels and provide individualized support.

[0269] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0270] In this invention, the server includes voice analysis means, generative artificial intelligence means, and mental health assessment means. This makes it possible to analyze received voice data and quantitatively evaluate emotional state and stress levels. Furthermore, the user device can visually or audibly present appropriate feedback and care suggestions to the user based on the analysis results. This enables the user to consciously manage their mental health in their daily life.

[0271] "Voice analysis means" refers to a device or method that receives voice data, analyzes it, and generates indicators such as emotional state and stress level.

[0272] "Generative artificial intelligence means" refers to artificial intelligence technology that evaluates the user's mental health based on the analysis results of received audio data, and extracts and presents the analysis results.

[0273] A "mental health assessment method" is a process or device that quantitatively evaluates a user's mental health status using data obtained from voice analysis.

[0274] A "feedback provision method" is a method or device for presenting improvement suggestions or care content to users based on analysis results and assessments of their mental health status.

[0275] "Continuous follow-up measures" refer to functions or means that continuously analyze past mental health data, make adjustments as needed, and support the user's long-term mental health.

[0276] "User device" refers to a device or means that allows the user to receive analysis results and obtain instructions or suggestions visually or audibly.

[0277] This invention is a system for managing and providing care for the mental health of employees. The system mainly consists of a server, terminals, and users.

[0278] First, the process begins with the user voice-inputting their emotions and stress levels into the device. The device then records this voice data and transmits it to a server via a secure protocol. The hardware used includes a highly sensitive microphone and a communication module for data transmission.

[0279] The server processes the received audio data using a speech analysis engine (e.g., Google Cloud Speech-to-Text or Amazon Transcribe) to analyze the tone, speed, and intonation of the voice. The analysis results are then used with a generative AI model to quantify the user's emotional state and stress level. The generative AI model used here learns from past data, enabling more accurate assessments.

[0280] The server then uses artificial intelligence generation to perform a mental health assessment based on the analysis results. The assessment results are sent to a terminal, which provides visual feedback through a user interface. This feedback includes suggestions for self-care tailored to the user's mental state and instructions for relaxation exercises. The terminal is equipped with a visual display device and an audio output device to facilitate the user's implementation of the recommended care.

[0281] As a specific example, when a user feels stressed at work, they say to the terminal, "I'm feeling a bit stressed today. Please teach me how to relax." Based on this prompt, the server performs analysis and conducts a mental health assessment. Subsequently, the server provides feedback such as suggesting appropriate relaxation music for the user to select and play, or displaying guides for a series of yoga poses. In this way, users can easily manage their mental health in their daily lives.

[0282] The flow of the specific process in Application Example 1 will be described using FIG. 12.

[0283] Step 1:

[0284] The user makes a voice input about their emotions and stress towards the terminal. This input voice data is captured by the high-sensitivity microphone of the terminal. The terminal converts the acquired voice data into digital format and temporarily stores it for use in the next step.

[0285] Step 2:

[0286] The terminal sends the voice data to the server via a secure protocol. Here, operations are performed to ensure data security by using encrypted communication such as SSL / TLS. The transmitted voice data is used as input for the voice analysis process on the server.

[0287] Step 3:

[0288] The server processes the received voice data using a voice analysis engine (e.g., Google Cloud Speech-to-Text). In this analysis, the tone, speed, and intonation of the voice are analyzed. As data processing, these parameters are digitized, and indicators indicating the emotional state and stress level are generated. The output of this step is the digitized evaluation indicators.

[0289] Step 4:

[0290] The server uses generative artificial intelligence to evaluate the user's mental health based on evaluation metrics obtained from voice analysis. In this step, the generative AI model calculates the evaluation results while considering past health data. The output is a detailed evaluation result regarding the user's mental health.

[0291] Step 5:

[0292] The server sends the assessment results to the terminal. These results include feedback for the user. As part of providing feedback, it is presented as audio or visual data to help the user easily understand the content and take recommended self-care actions. The terminal displays the assessment results visually and provides audio instructions as needed.

[0293] Step 6:

[0294] Users perform relaxation exercises based on feedback received from their devices. Through these specific actions, users aim to reduce their stress and regain peace of mind. Examples of this feedback include playing videos guiding users through yoga poses.

[0295] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0296] The present invention provides a system for precisely managing the mental health of employees, which includes voice analysis means, generative artificial intelligence means, mental health assessment means, feedback provision means based on analysis results, and continuous follow-up means, and further provides a function to recognize the user's emotions by combining it with an emotion engine.

[0297] In this system, the server performs crucial processing. When a user inputs their emotions or daily events via voice into a terminal, the terminal records the audio and sends it to the server as digital data. The server analyzes the received audio data using voice analysis tools and generates quantitative indicators based on the tone, speed, and intonation of the emotion. During this analysis process, an emotion engine functions to recognize the emotional state in the voice with high accuracy. The emotion engine uses machine learning algorithms to analyze the user's emotions and reflects the recognition results in a mental health assessment.

[0298] The generative artificial intelligence system uses data from an emotion engine to comprehensively assess the user's mental state. This assessment includes indicators such as stress levels, happiness, and fatigue, clearly indicating the employee's current mental state. The assessment results are used to generate clear, actionable feedback, which the server then sends to the terminal.

[0299] The device displays feedback received from the server to the user in an easy-to-understand format. The user can then review this feedback and implement recommended self-care actions. For example, if the emotional engine detects high stress levels, the device provides the user with guidance on relaxation exercises.

[0300] When users take action based on feedback, the results are also recorded in the system. The server uses continuous follow-up to track the user's mental health over the long term and, if necessary, offers further care suggestions to the user. In this way, by incorporating an emotion engine, the system provides a more accurate and effective way to manage the user's mental health.

[0301] The following describes the processing flow.

[0302] Step 1:

[0303] The user inputs voice about their emotions and current state towards the terminal. The terminal records this voice and prepares to save it as digital data.

[0304] Step 2:

[0305] The terminal converts the recorded voice data into a predetermined format (e.g., WAV or MP3) and encrypts it for secure transmission to the server. Then, it executes the transmission procedure.

[0306] Step 3:

[0307] The server decrypts the received voice data and analyzes it using voice analysis means. The voice analysis is performed based on the tone, speed, intonation, etc. of the sound to extract basic emotion information.

[0308] Step 4:

[0309] The emotion engine in the server functions to accurately recognize emotions from the analyzed voice data. The emotion engine utilizes machine learning algorithms and passes the obtained emotion data to subsequent processing.

[0310] Step 5:

[0311] The server comprehensively evaluates the mental health state using the generated artificial intelligence means. In this process, it utilizes the data from the emotion engine to generate indicators such as stress level and happiness level.

[0312] Step 6:

[0313] The server generates feedback to the user based on the evaluation results. This feedback includes recommended action guidelines and self-care suggestions.

[0314] Step 7:

[0315] Feedback is sent from the server to the terminal, which then notifies and displays it to the user. The feedback is provided as specific guidelines that the user can immediately implement.

[0316] Step 8:

[0317] Users follow instructions from the device and perform suggested self-care activities. For example, they might engage in activities such as meditation or stretching to maintain their mental health.

[0318] Step 9:

[0319] The server periodically analyzes the user's past health data and its results through continuous follow-up mechanisms. Additional feedback and support can be provided as needed.

[0320] (Example 2)

[0321] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0322] In modern society, accurately and effectively managing employees' mental health is a challenge. Traditional systems have struggled to quickly detect emotional changes and stress levels and provide appropriate feedback. Furthermore, long-term mental health follow-up has been insufficient, making it difficult to sustainably maintain employees' mental well-being.

[0323] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0324] In this invention, the server includes an input means for receiving voice data, an analysis means for recognizing emotional states using voice analysis means, an evaluation means for evaluating mental health using generative artificial intelligence, a providing means for generating and providing feedback based on the evaluation results, and a follow-up means for tracking user behavior and monitoring mental health over the long term. This enables high-precision analysis of employees' emotions and stress levels, prompt provision of appropriate feedback, and long-term mental health management.

[0325] An "input device" is an interface for a user to provide voice data to a system, and is a device that converts voice into digital data.

[0326] "Analysis means" refers to a device or program that processes audio data within a server and analyzes the tone, speed, and intonation of the voice in order to identify the emotional state.

[0327] The "evaluation method" refers to a function within the system that uses generative artificial intelligence to comprehensively evaluate the user's mental health based on the analysis results.

[0328] "Means of providing" refers to a function or device that generates and provides appropriate feedback to the user based on the results from the evaluation means.

[0329] A "follow-up tool" is a function or system that tracks changes in a user's behavior and mental health over the long term and proposes necessary care again.

[0330] This invention is a system for managing the mental health of employees, implemented through server, terminal, and user interaction. Specifically, it effectively grasps the user's mental state and provides appropriate feedback and care by analyzing voice data using an emotion engine and performing mental health assessments utilizing a generative AI model.

[0331] The user first uses voice input into the device. The device has a recording function that records the user's voice as digital data. The recorded data is then transmitted to the server via a secure communication protocol.

[0332] The server processes the received audio data using speech analysis software. Here, the emotion engine operates, analyzing the tone, speed, and intonation of the speech to quantify the user's emotional state. This emotional data is then analyzed within a generative AI model and reflected in an assessment of mental health. The AI model provides indicators such as stress levels, happiness, and fatigue to evaluate the user's current mental state.

[0333] Based on the evaluation results, the server generates specific feedback and sends it to the terminal. The terminal displays this feedback visually, presenting it in a user-friendly format. For example, if high stress levels are detected, the terminal provides the user with a guide to relaxation exercises.

[0334] Furthermore, the system records how users respond to feedback, and the server stores the results in a database as a means of continuous follow-up. This enables long-term monitoring of the user's mental health and allows for the re-suggestion of necessary care. For example, a prompt such as, "Please tell us about a stressful experience you've had recently, and what actions did you take as a result?" is used.

[0335] This system makes it possible to accurately assess the mental health of employees and implement effective care.

[0336] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0337] Step 1:

[0338] The user inputs everyday events and emotions into the device via voice. The device records the input voice and converts it into digital data. This data is then sent to the server as input data.

[0339] Step 2:

[0340] The server processes the received audio data using audio analysis tools. Specifically, it decodes the audio and uses analysis software to extract tone, speed, and intonation. As a result, quantified emotion data is generated. This data is output as a model of the emotional state.

[0341] Step 3:

[0342] Using an emotional state model, the server evaluates the user's mental health within a generated AI model. The AI model takes emotional data as input and calculates indicators of mental state such as stress level, happiness, and fatigue. As a result, a comprehensive mental state evaluation result is generated. This result is output as evaluation data.

[0343] Step 4:

[0344] Based on evaluation data, the server uses feedback provisioning mechanisms to generate specific feedback for the user. This feedback is generated as text or audio guidance and sent to the device. As a result, clear and actionable feedback is output.

[0345] Step 5:

[0346] The device visually displays the feedback received from the server and informs the user. The user reviews the feedback and takes the recommended action. During this process, the device records the user's reactions and behavioral data. As a result, an action log is output.

[0347] Step 6:

[0348] The server uses behavioral logs as a means of continuous follow-up to track the user's mental health over the long term. Based on this data, it suggests further care to the user as needed. As a result, follow-up data is output.

[0349] (Application Example 2)

[0350] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0351] In modern society, particularly in caregiving settings, effectively managing the mental health of both caregivers and clients is crucial. However, traditional methods often overlook signs of emotions and stress, making appropriate responses difficult. Therefore, there is a need for a system that can more accurately and comprehensively assess psychological states and provide appropriate care methods.

[0352] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0353] In this invention, the server includes voice conversion means, machine intelligence means, and psychological health assessment means. This enables accurate understanding of the psychological state of caregivers and users, and allows for appropriate care and feedback based on that understanding.

[0354] A "speech conversion method" is a means for analyzing speech data and detecting its characteristics, speed, and intonation.

[0355] "Machine intelligence means" refers to a method for evaluating psychological states using artificial intelligence technology based on information obtained from voice data.

[0356] A "psychological health assessment tool" is a method for quantifying and evaluating mental health based on data analyzed by machine intelligence.

[0357] "Information provision methods based on analysis results" refer to means of presenting appropriate care methods to users based on the results obtained from psychological health assessment methods.

[0358] "Continuous tracking measures" refer to methods for recording and analyzing the user's psychological state over a long period and providing additional support as needed.

[0359] This system accurately manages the mental health of caregivers and users in care settings and provides appropriate care. The system primarily consists of voice conversion, machine intelligence, psychological health assessment, information provision based on analysis results, and continuous tracking.

[0360] The server receives conversations between caregivers and users as digital audio data using a speech conversion system. The received audio data is analyzed for characteristics, speed, and intonation through the speech conversion system. Next, a machine intelligence system uses the analyzed data to evaluate the psychological state. This process utilizes software called an emotion engine to identify signs of emotion and stress in the audio with high accuracy.

[0361] The psychological health assessment tool quantitatively evaluates the stress levels and well-being of caregivers and users based on the results of emotion analysis obtained by machine intelligence. The server transmits the assessment results to the caregiver using an information provision tool based on the analysis results. This feedback is presented to the caregiver as specific care methods and relaxation guides.

[0362] The continuous tracking system is designed to record the psychological health data of users and caregivers over the long term, enabling follow-up support and new care suggestions as needed. The server continuously collects and analyzes data to provide care methods that adapt to changes in the user's psychological state.

[0363] As a concrete example, if a voice conversion device detects an increase in stress in a user's conversation, the information provision device will provide feedback to the caregiver such as, "Please suggest activities that will help the user relax."

[0364] An example of a prompt to input into the generating AI model is: "Evaluate stress and well-being from the voice data and generate specific feedback for the caregiver."

[0365] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0366] Step 1:

[0367] The device records conversations between the caregiver and the user. It acquires the audio data and sends it to the server in a digital format. In this case, the input is an audio signal, and the output is digital audio data.

[0368] Step 2:

[0369] The server analyzes the received audio data using a speech conversion device. It analyzes characteristics, speed, and intonation to generate initial emotional indicators. The input is digital audio data, and the output is analyzed data including emotional indicators.

[0370] Step 3:

[0371] The server uses machine intelligence to analyze the speech and evaluate the user's psychological state based on emotional indicators. An emotion engine is used to quantify stress levels and happiness. The input is emotional indicators, and the output is evaluation data indicating the psychological state.

[0372] Step 4:

[0373] The server generates feedback based on psychological state assessment data and analysis results using information provision methods. It constructs feedback that includes specific care methods and guidelines. The input is psychological state assessment data, and the output is a feedback message.

[0374] Step 5:

[0375] The terminal displays feedback messages received from the server to the caregiver. Based on the feedback, it suggests care methods and influences the caregiver's actions. The input is the feedback message, and the output is the caregiver's implementation of care.

[0376] Step 6:

[0377] The server uses continuous tracking to record and analyze long-term psychological health data of caregivers and users. It provides additional care suggestions as needed. The input is newly collected psychological data, and the output is updated care suggestions.

[0378] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0379] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0380] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0381] [Third Embodiment]

[0382] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0383] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0384] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0385] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0386] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0387] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0388] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0389] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0390] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0391] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0392] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0393] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0394] This invention is a system for managing and providing care for the mental health of employees, and has the following configuration: The system consists of three elements: a server, a terminal, and a user.

[0395] The server plays a crucial role in analyzing and evaluating received audio data using voice analysis and generative artificial intelligence (AI) means. Audio data is transmitted from terminals, and the server processes it using a voice analysis engine. The voice analysis engine analyzes the tone, speed, and intonation of the voice, quantifying the employee's emotional state and stress level. Next, the generative AI means performs a mental health assessment based on these analysis results. The AI considers the employee's usage history and past data, enabling assessments tailored to individual circumstances.

[0396] The device records the user's voice data and securely transmits it to a server. It also displays the evaluation results received from the server on the user interface, providing feedback to the user. This feedback includes suggestions and actions for care tailored to the user's mental state. The device strives for a simple and easy-to-understand display, enabling users to easily implement the recommended self-care.

[0397] Users record their daily mental state by using voice input to describe their emotions and daily circumstances into the device, which is then sent to a server. This system allows users to consciously monitor their own mental health. Based on the feedback received from the device, users then take appropriate self-care measures to maintain their mental well-being.

[0398] To give a specific example, when a user feels stressed during a day's work, they can speak their feelings into the device. The server analyzes the audio, and the AI evaluates the stress level. After checking the evaluation results on the device, the user can regain peace of mind by performing recommended relaxation exercises as instructed by the device.

[0399] Thus, this system utilizes voice analysis technology and artificial intelligence to provide a way for users to continuously manage their mental health and take necessary self-care actions in a timely manner.

[0400] The following describes the processing flow.

[0401] Step 1:

[0402] The user inputs their daily emotions and experiences into the device via voice input. The device starts recording and acquires the user's speech as digital audio data.

[0403] Step 2:

[0404] The device converts the recorded audio data into an appropriate format and encrypts the data to ensure security. It then sends the prepared audio data to the server.

[0405] Step 3:

[0406] The server decodes the received audio data and passes it to the audio analysis engine. The analysis engine analyzes the tone, speed, intonation, etc., of the voice and generates indicators of emotion and stress.

[0407] Step 4:

[0408] The server's AI generation method evaluates employees' mental health based on the results of voice analysis. The AI module utilizes psychological knowledge to quantify stress levels, happiness levels, and other factors.

[0409] Step 5:

[0410] Based on the results of the mental health assessment, the server determines appropriate care and behavioral guidelines for the user. This includes recommendations for relaxation exercises and consultations with professionals.

[0411] Step 6:

[0412] The terminal receives feedback from the server and displays it in a format that is easy for the user to understand. The terminal also presents the user with any necessary guides or additional information.

[0413] Step 7:

[0414] The user follows instructions from the device and performs recommended self-care actions, such as engaging in stress-relieving exercises.

[0415] Step 8:

[0416] As part of ongoing follow-up, the server periodically analyzes the user's past data and, if necessary, proposes various care and follow-up options.

[0417] (Example 1)

[0418] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0419] In today's workplace, it is crucial to properly manage employees' mental health and provide effective care. However, traditional methods make it difficult to quickly and accurately assess individual employees' situations and emotional states, and thus hinder the provision of appropriate feedback and self-care recommendations.

[0420] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0421] In this invention, the server includes voice analysis means, generated data processing means, and health status evaluation means. This makes it possible to analyze the voice data of individual employees and quickly and accurately evaluate their mental health status.

[0422] "Voice analysis means" refers to elements used to extract characteristics from voice data and generate indicators of emotion and stress.

[0423] "Generated data processing means" refers to elements for performing individual evaluations based on analyzed data and generating appropriate feedback.

[0424] "Health status assessment means" refers to elements for evaluating the mental health status of employees, taking into account analyzed voice data and past history.

[0425] "Means of providing information based on analysis results" refers to elements that utilize the results obtained from voice analysis and health status evaluation to provide users with specific feedback and self-care recommendations.

[0426] "Continuous monitoring measures" are elements that support long-term mental health by analyzing the user's past data and suggesting further care methods as needed.

[0427] This invention is a system for managing the mental health of employees and providing appropriate care. Specifically, it consists of three elements: a server, a terminal, and a user.

[0428] The server plays a crucial role in processing the audio data received from the user using speech analysis tools. During this process, it analyzes the characteristics of the speech using speech analysis engines such as Google Cloud Speech-to-Text. This analysis quantifies characteristics such as tone, speed, and intonation. Subsequently, using generative data processing tools, the data is analyzed using a generative AI model (e.g., AI software with general natural language processing technology) to assess the user's mental health. This process also utilizes past data history, allowing for assessments tailored to individual circumstances.

[0429] The device provides an interface for recording the user's voice data and securely transmitting it to a server. The recorded voice data is encrypted and transmitted via a secure protocol such as HTTPS. Furthermore, when feedback is received from the server, the device displays the results on the user interface. Here, relaxation exercises or guidance videos are displayed, making it easy for the user to understand and perform them.

[0430] This system allows users to input their emotions and daily situations into a device using voice. For example, if a user says to the device, "I felt a little stressed at work today," a prompt will appear saying, "Please tell us how you are feeling right now so we can assess your stress level." The server then starts the analysis and sends the results to the device.

[0431] This system allows users to consciously manage their own mental health and receive appropriate self-care.

[0432] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0433] Step 1:

[0434] The user provides voice input to the device about their emotions and everyday situations. The input voice data is recorded by the device. Specifically, the device's microphone captures the voice and converts it into a digital audio file. This audio file becomes the input data for subsequent processing.

[0435] Step 2:

[0436] The device sends the recorded audio data to the server using a secure network protocol (e.g., HTTPS). The data is encrypted, preventing unauthorized access by third parties. The server receives the transmitted data and uses it as input for audio analysis.

[0437] Step 3:

[0438] The server analyzes the received audio data using speech analysis tools. This process analyzes the tone, speed, and intonation of the speech. Specifically, the speech analysis engine uses digital signal processing technology to quantify the audio data and generate indicators of emotion and stress. This provides the output as the analysis result.

[0439] Step 4:

[0440] The server uses a data generation processing system and applies a generation AI model to evaluate mental health based on the analysis results. Past data history is also considered at this stage. The output is an individual employee mental health assessment. This assessment serves as the basis for generating feedback.

[0441] Step 5:

[0442] The server uses a feedback provision mechanism based on the analysis results to generate feedback tailored to each employee. Specifically, it utilizes a generation AI model to create feedback that includes self-care methods and relaxation actions according to the user's condition. This feedback is then prepared as output to the terminal.

[0443] Step 6:

[0444] The device receives feedback from the server and displays it in the user interface. Specifically, the feedback content is visually displayed on the device's screen, and a notification function informs the user of the arrival of feedback as needed. The user receives this as output and confirms the actions that should be taken.

[0445] Step 7:

[0446] Based on the feedback displayed on the device, the user performs suggested self-care actions. For example, they might take action to maintain their mental health by performing relaxation exercises. At this stage, the feedback acts as direct input, and the user's actions are the output.

[0447] (Application Example 1)

[0448] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0449] There is a need for a system that can effectively manage and continuously follow up on the mental health of employees. In particular, there is a lack of convenient ways to assess the mental health of users in their daily lives and provide appropriate care. To address this problem, it is necessary to provide a system that uses voice analysis technology and artificial intelligence to evaluate users' emotional states and stress levels and provide individualized support.

[0450] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0451] In this invention, the server includes voice analysis means, generative artificial intelligence means, and mental health assessment means. This makes it possible to analyze received voice data and quantitatively evaluate emotional state and stress levels. Furthermore, the user device can visually or audibly present appropriate feedback and care suggestions to the user based on the analysis results. This enables the user to consciously manage their mental health in their daily life.

[0452] "Voice analysis means" refers to a device or method that receives voice data, analyzes it, and generates indicators such as emotional state and stress level.

[0453] "Generative artificial intelligence means" refers to artificial intelligence technology that evaluates the user's mental health based on the analysis results of received audio data, and extracts and presents the analysis results.

[0454] A "mental health assessment method" is a process or device that quantitatively evaluates a user's mental health status using data obtained from voice analysis.

[0455] A "feedback provision method" is a method or device for presenting improvement suggestions or care content to users based on analysis results and assessments of their mental health status.

[0456] "Continuous follow-up measures" refer to functions or means that continuously analyze past mental health data, make adjustments as needed, and support the user's long-term mental health.

[0457] "User device" refers to a device or means that allows the user to receive analysis results and obtain instructions or suggestions visually or audibly.

[0458] This invention is a system for managing and providing care for the mental health of employees. The system mainly consists of a server, terminals, and users.

[0459] First, the process begins with the user voice-inputting their emotions and stress levels into the device. The device then records this voice data and transmits it to a server via a secure protocol. The hardware used includes a highly sensitive microphone and a communication module for data transmission.

[0460] The server processes the received audio data using a speech analysis engine (e.g., Google Cloud Speech-to-Text or Amazon Transcribe) to analyze the tone, speed, and intonation of the voice. The analysis results are then used with a generative AI model to quantify the user's emotional state and stress level. The generative AI model used here learns from past data, enabling more accurate assessments.

[0461] The server then uses artificial intelligence generation to perform a mental health assessment based on the analysis results. The assessment results are sent to a terminal, which provides visual feedback through a user interface. This feedback includes suggestions for self-care tailored to the user's mental state and instructions for relaxation exercises. The terminal is equipped with a visual display device and an audio output device to facilitate the user's implementation of the recommended care.

[0462] For example, if a user feels stressed at work, they might say to their device, "I'm feeling a little stressed today, can you tell me how to relax?" Based on this prompt, the server performs an analysis and conducts a mental health assessment. It then provides feedback, such as suggesting and playing appropriate relaxation music or displaying a guide to a series of yoga poses. In this way, users can easily manage their mental health in their daily lives.

[0463] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0464] Step 1:

[0465] The user provides voice input to the device regarding their emotions and stress levels. This voice input data is captured by the device's high-sensitivity microphone. The device converts the acquired voice data into a digital format and temporarily stores it for use in the next step.

[0466] Step 2:

[0467] The terminal transmits voice data to the server via a secure protocol. This process uses encrypted communication such as SSL / TLS to ensure data security. The transmitted voice data is then used as input for the voice analysis process on the server.

[0468] Step 3:

[0469] The server processes the received audio data using a speech analysis engine (e.g., Google Cloud Speech-to-Text). This analysis examines the tone, speed, and intonation of the speech. As part of the data processing, these parameters are quantified to generate indices indicating emotional state and stress levels. The output of this step is a quantified evaluation index.

[0470] Step 4:

[0471] The server uses generative artificial intelligence to evaluate the user's mental health based on evaluation metrics obtained from voice analysis. In this step, the generative AI model calculates the evaluation results while considering past health data. The output is a detailed evaluation result regarding the user's mental health.

[0472] Step 5:

[0473] The server sends the assessment results to the terminal. These results include feedback for the user. As part of providing feedback, it is presented as audio or visual data to help the user easily understand the content and take recommended self-care actions. The terminal displays the assessment results visually and provides audio instructions as needed.

[0474] Step 6:

[0475] Users perform relaxation exercises based on feedback received from their devices. Through these specific actions, users aim to reduce their stress and regain peace of mind. Examples of this feedback include playing videos guiding users through yoga poses.

[0476] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0477] The present invention provides a system for precisely managing the mental health of employees, which includes voice analysis means, generative artificial intelligence means, mental health assessment means, feedback provision means based on analysis results, and continuous follow-up means, and further provides a function to recognize the user's emotions by combining it with an emotion engine.

[0478] In this system, the server performs crucial processing. When a user inputs their emotions or daily events via voice into a terminal, the terminal records the audio and sends it to the server as digital data. The server analyzes the received audio data using voice analysis tools and generates quantitative indicators based on the tone, speed, and intonation of the emotion. During this analysis process, an emotion engine functions to recognize the emotional state in the voice with high accuracy. The emotion engine uses machine learning algorithms to analyze the user's emotions and reflects the recognition results in a mental health assessment.

[0479] The generative artificial intelligence system uses data from an emotion engine to comprehensively assess the user's mental state. This assessment includes indicators such as stress levels, happiness, and fatigue, clearly indicating the employee's current mental state. The assessment results are used to generate clear, actionable feedback, which the server then sends to the terminal.

[0480] The device displays feedback received from the server to the user in an easy-to-understand format. The user can then review this feedback and implement recommended self-care actions. For example, if the emotional engine detects high stress levels, the device provides the user with guidance on relaxation exercises.

[0481] When users take action based on feedback, the results are also recorded in the system. The server uses continuous follow-up to track the user's mental health over the long term and, if necessary, offers further care suggestions to the user. In this way, by incorporating an emotion engine, the system provides a more accurate and effective way to manage the user's mental health.

[0482] The following describes the processing flow.

[0483] Step 1:

[0484] The user inputs voice information about their emotions and current state into the device. The device then records this voice and prepares to save it as digital data.

[0485] Step 2:

[0486] The terminal converts the recorded audio data into a predetermined format (e.g., WAV or MP3) and encrypts it for secure transmission to the server. It then executes the transmission procedure.

[0487] Step 3:

[0488] The server decodes the received audio data and analyzes it using an audio analysis tool. The audio analysis is based on factors such as tone, speed, and intonation, and extracts basic emotional information.

[0489] Step 4:

[0490] The emotion engine within the server functions to recognize emotions with high accuracy from the analyzed audio data. The emotion engine utilizes machine learning algorithms to pass the obtained emotion data to subsequent processing.

[0491] Step 5:

[0492] The server uses generative artificial intelligence to comprehensively assess mental health. This process utilizes data from an emotion engine to generate indicators such as stress levels and well-being.

[0493] Step 6:

[0494] The server generates feedback for the user based on the evaluation results. This feedback includes recommended actions and self-care suggestions.

[0495] Step 7:

[0496] Feedback is sent from the server to the terminal, which then notifies and displays it to the user. The feedback is provided as specific guidelines that the user can immediately implement.

[0497] Step 8:

[0498] Users follow instructions from the device and perform suggested self-care activities. For example, they might engage in activities such as meditation or stretching to maintain their mental health.

[0499] Step 9:

[0500] The server periodically analyzes the user's past health data and its results through continuous follow-up mechanisms. Additional feedback and support can be provided as needed.

[0501] (Example 2)

[0502] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0503] In modern society, accurately and effectively managing employees' mental health is a challenge. Traditional systems have struggled to quickly detect emotional changes and stress levels and provide appropriate feedback. Furthermore, long-term mental health follow-up has been insufficient, making it difficult to sustainably maintain employees' mental well-being.

[0504] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0505] In this invention, the server includes an input means for receiving voice data, an analysis means for recognizing emotional states using voice analysis means, an evaluation means for evaluating mental health using generative artificial intelligence, a providing means for generating and providing feedback based on the evaluation results, and a follow-up means for tracking user behavior and monitoring mental health over the long term. This enables high-precision analysis of employees' emotions and stress levels, prompt provision of appropriate feedback, and long-term mental health management.

[0506] An "input device" is an interface for a user to provide voice data to a system, and is a device that converts voice into digital data.

[0507] "Analysis means" refers to a device or program that processes audio data within a server and analyzes the tone, speed, and intonation of the voice in order to identify the emotional state.

[0508] The "evaluation method" refers to a function within the system that uses generative artificial intelligence to comprehensively evaluate the user's mental health based on the analysis results.

[0509] "Means of providing" refers to a function or device that generates and provides appropriate feedback to the user based on the results from the evaluation means.

[0510] A "follow-up tool" is a function or system that tracks changes in a user's behavior and mental health over the long term and proposes necessary care again.

[0511] This invention is a system for managing the mental health of employees, implemented through server, terminal, and user interaction. Specifically, it effectively grasps the user's mental state and provides appropriate feedback and care by analyzing voice data using an emotion engine and performing mental health assessments utilizing a generative AI model.

[0512] The user first uses voice input into the device. The device has a recording function that records the user's voice as digital data. The recorded data is then transmitted to the server via a secure communication protocol.

[0513] The server processes the received audio data using speech analysis software. Here, the emotion engine operates, analyzing the tone, speed, and intonation of the speech to quantify the user's emotional state. This emotional data is then analyzed within a generative AI model and reflected in an assessment of mental health. The AI model provides indicators such as stress levels, happiness, and fatigue to evaluate the user's current mental state.

[0514] Based on the evaluation results, the server generates specific feedback and sends it to the terminal. The terminal displays this feedback visually, presenting it in a user-friendly format. For example, if high stress levels are detected, the terminal provides the user with a guide to relaxation exercises.

[0515] Furthermore, the system records how users respond to feedback, and the server stores the results in a database as a means of continuous follow-up. This enables long-term monitoring of the user's mental health and allows for the re-suggestion of necessary care. For example, a prompt such as, "Please tell us about a stressful experience you've had recently, and what actions did you take as a result?" is used.

[0516] This system makes it possible to accurately assess the mental health of employees and implement effective care.

[0517] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0518] Step 1:

[0519] The user inputs everyday events and emotions into the device via voice. The device records the input voice and converts it into digital data. This data is then sent to the server as input data.

[0520] Step 2:

[0521] The server processes the received audio data using audio analysis tools. Specifically, it decodes the audio and uses analysis software to extract tone, speed, and intonation. As a result, quantified emotion data is generated. This data is output as a model of the emotional state.

[0522] Step 3:

[0523] Using an emotional state model, the server evaluates the user's mental health within a generated AI model. The AI model takes emotional data as input and calculates indicators of mental state such as stress level, happiness, and fatigue. As a result, a comprehensive mental state evaluation result is generated. This result is output as evaluation data.

[0524] Step 4:

[0525] Based on evaluation data, the server uses feedback provisioning mechanisms to generate specific feedback for the user. This feedback is generated as text or audio guidance and sent to the device. As a result, clear and actionable feedback is output.

[0526] Step 5:

[0527] The device visually displays the feedback received from the server and informs the user. The user reviews the feedback and takes the recommended action. During this process, the device records the user's reactions and behavioral data. As a result, an action log is output.

[0528] Step 6:

[0529] The server uses behavioral logs as a means of continuous follow-up to track the user's mental health over the long term. Based on this data, it suggests further care to the user as needed. As a result, follow-up data is output.

[0530] (Application Example 2)

[0531] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0532] In modern society, particularly in caregiving settings, effectively managing the mental health of both caregivers and clients is crucial. However, traditional methods often overlook signs of emotions and stress, making appropriate responses difficult. Therefore, there is a need for a system that can more accurately and comprehensively assess psychological states and provide appropriate care methods.

[0533] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0534] In this invention, the server includes voice conversion means, machine intelligence means, and psychological health assessment means. This enables accurate understanding of the psychological state of caregivers and users, and allows for appropriate care and feedback based on that understanding.

[0535] A "speech conversion method" is a means for analyzing speech data and detecting its characteristics, speed, and intonation.

[0536] "Machine intelligence means" refers to a method for evaluating psychological states using artificial intelligence technology based on information obtained from voice data.

[0537] A "psychological health assessment tool" is a method for quantifying and evaluating mental health based on data analyzed by machine intelligence.

[0538] "Information provision methods based on analysis results" refer to means of presenting appropriate care methods to users based on the results obtained from psychological health assessment methods.

[0539] "Continuous tracking measures" refer to methods for recording and analyzing the user's psychological state over a long period and providing additional support as needed.

[0540] This system accurately manages the mental health of caregivers and users in care settings and provides appropriate care. The system primarily consists of voice conversion, machine intelligence, psychological health assessment, information provision based on analysis results, and continuous tracking.

[0541] The server receives conversations between caregivers and users as digital audio data using a speech conversion system. The received audio data is analyzed for characteristics, speed, and intonation through the speech conversion system. Next, a machine intelligence system uses the analyzed data to evaluate the psychological state. This process utilizes software called an emotion engine to identify signs of emotion and stress in the audio with high accuracy.

[0542] The psychological health assessment tool quantitatively evaluates the stress levels and well-being of caregivers and users based on the results of emotion analysis obtained by machine intelligence. The server transmits the assessment results to the caregiver using an information provision tool based on the analysis results. This feedback is presented to the caregiver as specific care methods and relaxation guides.

[0543] The continuous tracking system is designed to record the psychological health data of users and caregivers over the long term, enabling follow-up support and new care suggestions as needed. The server continuously collects and analyzes data to provide care methods that adapt to changes in the user's psychological state.

[0544] As a concrete example, if a voice conversion device detects an increase in stress in a user's conversation, the information provision device will provide feedback to the caregiver such as, "Please suggest activities that will help the user relax."

[0545] An example of a prompt to input into the generating AI model is: "Evaluate stress and well-being from the voice data and generate specific feedback for the caregiver."

[0546] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0547] Step 1:

[0548] The device records conversations between the caregiver and the user. It acquires the audio data and sends it to the server in a digital format. In this case, the input is an audio signal, and the output is digital audio data.

[0549] Step 2:

[0550] The server analyzes the received audio data using a speech conversion device. It analyzes characteristics, speed, and intonation to generate initial emotional indicators. The input is digital audio data, and the output is analyzed data including emotional indicators.

[0551] Step 3:

[0552] The server uses machine intelligence to analyze the speech and evaluate the user's psychological state based on emotional indicators. An emotion engine is used to quantify stress levels and happiness. The input is emotional indicators, and the output is evaluation data indicating the psychological state.

[0553] Step 4:

[0554] The server generates feedback based on psychological state assessment data and analysis results using information provision methods. It constructs feedback that includes specific care methods and guidelines. The input is psychological state assessment data, and the output is a feedback message.

[0555] Step 5:

[0556] The terminal displays feedback messages received from the server to the caregiver. Based on the feedback, it suggests care methods and influences the caregiver's actions. The input is the feedback message, and the output is the caregiver's implementation of care.

[0557] Step 6:

[0558] The server uses continuous tracking to record and analyze long-term psychological health data of caregivers and users. It provides additional care suggestions as needed. The input is newly collected psychological data, and the output is updated care suggestions.

[0559] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0560] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0561] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0562] [Fourth Embodiment]

[0563] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0564] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0565] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0566] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0567] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0568] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0569] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0570] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0571] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0572] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0573] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0574] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0575] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0576] This invention is a system for managing and providing care for the mental health of employees, and has the following configuration: The system consists of three elements: a server, a terminal, and a user.

[0577] The server plays a crucial role in analyzing and evaluating received audio data using voice analysis and generative artificial intelligence (AI) means. Audio data is transmitted from terminals, and the server processes it using a voice analysis engine. The voice analysis engine analyzes the tone, speed, and intonation of the voice, quantifying the employee's emotional state and stress level. Next, the generative AI means performs a mental health assessment based on these analysis results. The AI considers the employee's usage history and past data, enabling assessments tailored to individual circumstances.

[0578] The device records the user's voice data and securely transmits it to a server. It also displays the evaluation results received from the server on the user interface, providing feedback to the user. This feedback includes suggestions and actions for care tailored to the user's mental state. The device strives for a simple and easy-to-understand display, enabling users to easily implement the recommended self-care.

[0579] Users record their daily mental state by using voice input to describe their emotions and daily circumstances into the device, which is then sent to a server. This system allows users to consciously monitor their own mental health. Based on the feedback received from the device, users then take appropriate self-care measures to maintain their mental well-being.

[0580] To give a specific example, when a user feels stressed during a day's work, they can speak their feelings into the device. The server analyzes the audio, and the AI evaluates the stress level. After checking the evaluation results on the device, the user can regain peace of mind by performing recommended relaxation exercises as instructed by the device.

[0581] Thus, this system utilizes voice analysis technology and artificial intelligence to provide a way for users to continuously manage their mental health and take necessary self-care actions in a timely manner.

[0582] The following describes the processing flow.

[0583] Step 1:

[0584] The user inputs their daily emotions and experiences into the device via voice input. The device starts recording and acquires the user's speech as digital audio data.

[0585] Step 2:

[0586] The device converts the recorded audio data into an appropriate format and encrypts the data to ensure security. It then sends the prepared audio data to the server.

[0587] Step 3:

[0588] The server decodes the received audio data and passes it to the audio analysis engine. The analysis engine analyzes the tone, speed, intonation, etc., of the voice and generates indicators of emotion and stress.

[0589] Step 4:

[0590] The server's AI generation method evaluates employees' mental health based on the results of voice analysis. The AI module utilizes psychological knowledge to quantify stress levels, happiness levels, and other factors.

[0591] Step 5:

[0592] Based on the results of the mental health assessment, the server determines appropriate care and behavioral guidelines for the user. This includes recommendations for relaxation exercises and consultations with professionals.

[0593] Step 6:

[0594] The terminal receives feedback from the server and displays it in a format that is easy for the user to understand. The terminal also presents the user with any necessary guides or additional information.

[0595] Step 7:

[0596] The user follows instructions from the device and performs recommended self-care actions, such as engaging in stress-relieving exercises.

[0597] Step 8:

[0598] As part of ongoing follow-up, the server periodically analyzes the user's past data and, if necessary, proposes various care and follow-up options.

[0599] (Example 1)

[0600] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0601] In today's workplace, it is crucial to properly manage employees' mental health and provide effective care. However, traditional methods make it difficult to quickly and accurately assess individual employees' situations and emotional states, and thus hinder the provision of appropriate feedback and self-care recommendations.

[0602] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0603] In this invention, the server includes voice analysis means, generated data processing means, and health status evaluation means. This makes it possible to analyze the voice data of individual employees and quickly and accurately evaluate their mental health status.

[0604] "Voice analysis means" refers to elements used to extract characteristics from voice data and generate indicators of emotion and stress.

[0605] "Generated data processing means" refers to elements for performing individual evaluations based on analyzed data and generating appropriate feedback.

[0606] "Health status assessment means" refers to elements for evaluating the mental health status of employees, taking into account analyzed voice data and past history.

[0607] "Means of providing information based on analysis results" refers to elements that utilize the results obtained from voice analysis and health status evaluation to provide users with specific feedback and self-care recommendations.

[0608] "Continuous monitoring measures" are elements that support long-term mental health by analyzing the user's past data and suggesting further care methods as needed.

[0609] This invention is a system for managing the mental health of employees and providing appropriate care. Specifically, it consists of three elements: a server, a terminal, and a user.

[0610] The server plays a crucial role in processing the audio data received from the user using speech analysis tools. During this process, it analyzes the characteristics of the speech using speech analysis engines such as Google Cloud Speech-to-Text. This analysis quantifies characteristics such as tone, speed, and intonation. Subsequently, using generative data processing tools, the data is analyzed using a generative AI model (e.g., AI software with general natural language processing technology) to assess the user's mental health. This process also utilizes past data history, allowing for assessments tailored to individual circumstances.

[0611] The device provides an interface for recording the user's voice data and securely transmitting it to a server. The recorded voice data is encrypted and transmitted via a secure protocol such as HTTPS. Furthermore, when feedback is received from the server, the device displays the results on the user interface. Here, relaxation exercises or guidance videos are displayed, making it easy for the user to understand and perform them.

[0612] This system allows users to input their emotions and daily situations into a device using voice. For example, if a user says to the device, "I felt a little stressed at work today," a prompt will appear saying, "Please tell us how you are feeling right now so we can assess your stress level." The server then starts the analysis and sends the results to the device.

[0613] This system allows users to consciously manage their own mental health and receive appropriate self-care.

[0614] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0615] Step 1:

[0616] The user provides voice input to the device about their emotions and everyday situations. The input voice data is recorded by the device. Specifically, the device's microphone captures the voice and converts it into a digital audio file. This audio file becomes the input data for subsequent processing.

[0617] Step 2:

[0618] The device sends the recorded audio data to the server using a secure network protocol (e.g., HTTPS). The data is encrypted, preventing unauthorized access by third parties. The server receives the transmitted data and uses it as input for audio analysis.

[0619] Step 3:

[0620] The server analyzes the received audio data using speech analysis tools. This process analyzes the tone, speed, and intonation of the speech. Specifically, the speech analysis engine uses digital signal processing technology to quantify the audio data and generate indicators of emotion and stress. This provides the output as the analysis result.

[0621] Step 4:

[0622] The server uses a data generation processing system and applies a generation AI model to evaluate mental health based on the analysis results. Past data history is also considered at this stage. The output is an individual employee mental health assessment. This assessment serves as the basis for generating feedback.

[0623] Step 5:

[0624] The server uses a feedback provision mechanism based on the analysis results to generate feedback tailored to each employee. Specifically, it utilizes a generation AI model to create feedback that includes self-care methods and relaxation actions according to the user's condition. This feedback is then prepared as output to the terminal.

[0625] Step 6:

[0626] The device receives feedback from the server and displays it in the user interface. Specifically, the feedback content is visually displayed on the device's screen, and a notification function informs the user of the arrival of feedback as needed. The user receives this as output and confirms the actions that should be taken.

[0627] Step 7:

[0628] Based on the feedback displayed on the device, the user performs suggested self-care actions. For example, they might take action to maintain their mental health by performing relaxation exercises. At this stage, the feedback acts as direct input, and the user's actions are the output.

[0629] (Application Example 1)

[0630] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0631] There is a need for a system that can effectively manage and continuously follow up on the mental health of employees. In particular, there is a lack of convenient ways to assess the mental health of users in their daily lives and provide appropriate care. To address this problem, it is necessary to provide a system that uses voice analysis technology and artificial intelligence to evaluate users' emotional states and stress levels and provide individualized support.

[0632] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0633] In this invention, the server includes voice analysis means, generative artificial intelligence means, and mental health assessment means. This makes it possible to analyze received voice data and quantitatively evaluate emotional state and stress levels. Furthermore, the user device can visually or audibly present appropriate feedback and care suggestions to the user based on the analysis results. This enables the user to consciously manage their mental health in their daily life.

[0634] "Voice analysis means" refers to a device or method that receives voice data, analyzes it, and generates indicators such as emotional state and stress level.

[0635] "Generative artificial intelligence means" refers to artificial intelligence technology that evaluates the user's mental health based on the analysis results of received audio data, and extracts and presents the analysis results.

[0636] A "mental health assessment method" is a process or device that quantitatively evaluates a user's mental health status using data obtained from voice analysis.

[0637] A "feedback provision method" is a method or device for presenting improvement suggestions or care content to users based on analysis results and assessments of their mental health status.

[0638] "Continuous follow-up measures" refer to functions or means that continuously analyze past mental health data, make adjustments as needed, and support the user's long-term mental health.

[0639] "User device" refers to a device or means that allows the user to receive analysis results and obtain instructions or suggestions visually or audibly.

[0640] This invention is a system for managing and providing care for the mental health of employees. The system mainly consists of a server, terminals, and users.

[0641] First, the process begins with the user voice-inputting their emotions and stress levels into the device. The device then records this voice data and transmits it to a server via a secure protocol. The hardware used includes a highly sensitive microphone and a communication module for data transmission.

[0642] The server processes the received audio data using a speech analysis engine (e.g., Google Cloud Speech-to-Text or Amazon Transcribe) to analyze the tone, speed, and intonation of the voice. The analysis results are then used with a generative AI model to quantify the user's emotional state and stress level. The generative AI model used here learns from past data, enabling more accurate assessments.

[0643] The server then uses artificial intelligence generation to perform a mental health assessment based on the analysis results. The assessment results are sent to a terminal, which provides visual feedback through a user interface. This feedback includes suggestions for self-care tailored to the user's mental state and instructions for relaxation exercises. The terminal is equipped with a visual display device and an audio output device to facilitate the user's implementation of the recommended care.

[0644] For example, if a user feels stressed at work, they might say to their device, "I'm feeling a little stressed today, can you tell me how to relax?" Based on this prompt, the server performs an analysis and conducts a mental health assessment. It then provides feedback, such as suggesting and playing appropriate relaxation music or displaying a guide to a series of yoga poses. In this way, users can easily manage their mental health in their daily lives.

[0645] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0646] Step 1:

[0647] The user provides voice input to the device regarding their emotions and stress levels. This voice input data is captured by the device's high-sensitivity microphone. The device converts the acquired voice data into a digital format and temporarily stores it for use in the next step.

[0648] Step 2:

[0649] The terminal transmits voice data to the server via a secure protocol. This process uses encrypted communication such as SSL / TLS to ensure data security. The transmitted voice data is then used as input for the voice analysis process on the server.

[0650] Step 3:

[0651] The server processes the received audio data using a speech analysis engine (e.g., Google Cloud Speech-to-Text). This analysis examines the tone, speed, and intonation of the speech. As part of the data processing, these parameters are quantified to generate indices indicating emotional state and stress levels. The output of this step is a quantified evaluation index.

[0652] Step 4:

[0653] The server uses generative artificial intelligence to evaluate the user's mental health based on evaluation metrics obtained from voice analysis. In this step, the generative AI model calculates the evaluation results while considering past health data. The output is a detailed evaluation result regarding the user's mental health.

[0654] Step 5:

[0655] The server sends the assessment results to the terminal. These results include feedback for the user. As part of providing feedback, it is presented as audio or visual data to help the user easily understand the content and take recommended self-care actions. The terminal displays the assessment results visually and provides audio instructions as needed.

[0656] Step 6:

[0657] Users perform relaxation exercises based on feedback received from their devices. Through these specific actions, users aim to reduce their stress and regain peace of mind. Examples of this feedback include playing videos guiding users through yoga poses.

[0658] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0659] The present invention provides a system for precisely managing the mental health of employees, which includes voice analysis means, generative artificial intelligence means, mental health assessment means, feedback provision means based on analysis results, and continuous follow-up means, and further provides a function to recognize the user's emotions by combining it with an emotion engine.

[0660] In this system, the server performs crucial processing. When a user inputs their emotions or daily events via voice into a terminal, the terminal records the audio and sends it to the server as digital data. The server analyzes the received audio data using voice analysis tools and generates quantitative indicators based on the tone, speed, and intonation of the emotion. During this analysis process, an emotion engine functions to recognize the emotional state in the voice with high accuracy. The emotion engine uses machine learning algorithms to analyze the user's emotions and reflects the recognition results in a mental health assessment.

[0661] The generative artificial intelligence system uses data from an emotion engine to comprehensively assess the user's mental state. This assessment includes indicators such as stress levels, happiness, and fatigue, clearly indicating the employee's current mental state. The assessment results are used to generate clear, actionable feedback, which the server then sends to the terminal.

[0662] The device displays feedback received from the server to the user in an easy-to-understand format. The user can then review this feedback and implement recommended self-care actions. For example, if the emotional engine detects high stress levels, the device provides the user with guidance on relaxation exercises.

[0663] When users take action based on feedback, the results are also recorded in the system. The server uses continuous follow-up to track the user's mental health over the long term and, if necessary, offers further care suggestions to the user. In this way, by incorporating an emotion engine, the system provides a more accurate and effective way to manage the user's mental health.

[0664] The following describes the processing flow.

[0665] Step 1:

[0666] The user inputs voice information about their emotions and current state into the device. The device then records this voice and prepares to save it as digital data.

[0667] Step 2:

[0668] The terminal converts the recorded audio data into a predetermined format (e.g., WAV or MP3) and encrypts it for secure transmission to the server. It then executes the transmission procedure.

[0669] Step 3:

[0670] The server decodes the received audio data and analyzes it using an audio analysis tool. The audio analysis is based on factors such as tone, speed, and intonation, and extracts basic emotional information.

[0671] Step 4:

[0672] The emotion engine within the server functions to recognize emotions with high accuracy from the analyzed audio data. The emotion engine utilizes machine learning algorithms to pass the obtained emotion data to subsequent processing.

[0673] Step 5:

[0674] The server uses generative artificial intelligence to comprehensively assess mental health. This process utilizes data from an emotion engine to generate indicators such as stress levels and well-being.

[0675] Step 6:

[0676] The server generates feedback for the user based on the evaluation results. This feedback includes recommended actions and self-care suggestions.

[0677] Step 7:

[0678] Feedback is sent from the server to the terminal, which then notifies and displays it to the user. The feedback is provided as specific guidelines that the user can immediately implement.

[0679] Step 8:

[0680] Users follow instructions from the device and perform suggested self-care activities. For example, they might engage in activities such as meditation or stretching to maintain their mental health.

[0681] Step 9:

[0682] The server periodically analyzes the user's past health data and its results through continuous follow-up mechanisms. Additional feedback and support can be provided as needed.

[0683] (Example 2)

[0684] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0685] In modern society, accurately and effectively managing employees' mental health is a challenge. Traditional systems have struggled to quickly detect emotional changes and stress levels and provide appropriate feedback. Furthermore, long-term mental health follow-up has been insufficient, making it difficult to sustainably maintain employees' mental well-being.

[0686] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0687] In this invention, the server includes an input means for receiving voice data, an analysis means for recognizing emotional states using voice analysis means, an evaluation means for evaluating mental health using generative artificial intelligence, a providing means for generating and providing feedback based on the evaluation results, and a follow-up means for tracking user behavior and monitoring mental health over the long term. This enables high-precision analysis of employees' emotions and stress levels, prompt provision of appropriate feedback, and long-term mental health management.

[0688] An "input device" is an interface for a user to provide voice data to a system, and is a device that converts voice into digital data.

[0689] "Analysis means" refers to a device or program that processes audio data within a server and analyzes the tone, speed, and intonation of the voice in order to identify the emotional state.

[0690] The "evaluation method" refers to a function within the system that uses generative artificial intelligence to comprehensively evaluate the user's mental health based on the analysis results.

[0691] "Means of providing" refers to a function or device that generates and provides appropriate feedback to the user based on the results from the evaluation means.

[0692] A "follow-up tool" is a function or system that tracks changes in a user's behavior and mental health over the long term and proposes necessary care again.

[0693] This invention is a system for managing the mental health of employees, implemented through server, terminal, and user interaction. Specifically, it effectively grasps the user's mental state and provides appropriate feedback and care by analyzing voice data using an emotion engine and performing mental health assessments utilizing a generative AI model.

[0694] The user first uses voice input into the device. The device has a recording function that records the user's voice as digital data. The recorded data is then transmitted to the server via a secure communication protocol.

[0695] The server processes the received audio data using speech analysis software. Here, the emotion engine operates, analyzing the tone, speed, and intonation of the speech to quantify the user's emotional state. This emotional data is then analyzed within a generative AI model and reflected in an assessment of mental health. The AI model provides indicators such as stress levels, happiness, and fatigue to evaluate the user's current mental state.

[0696] Based on the evaluation results, the server generates specific feedback and sends it to the terminal. The terminal displays this feedback visually, presenting it in a user-friendly format. For example, if high stress levels are detected, the terminal provides the user with a guide to relaxation exercises.

[0697] Furthermore, the system records how users respond to feedback, and the server stores the results in a database as a means of continuous follow-up. This enables long-term monitoring of the user's mental health and allows for the re-suggestion of necessary care. For example, a prompt such as, "Please tell us about a stressful experience you've had recently, and what actions did you take as a result?" is used.

[0698] This system makes it possible to accurately assess the mental health of employees and implement effective care.

[0699] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0700] Step 1:

[0701] The user inputs everyday events and emotions into the device via voice. The device records the input voice and converts it into digital data. This data is then sent to the server as input data.

[0702] Step 2:

[0703] The server processes the received audio data using audio analysis tools. Specifically, it decodes the audio and uses analysis software to extract tone, speed, and intonation. As a result, quantified emotion data is generated. This data is output as a model of the emotional state.

[0704] Step 3:

[0705] Using an emotional state model, the server evaluates the user's mental health within a generated AI model. The AI model takes emotional data as input and calculates indicators of mental state such as stress level, happiness, and fatigue. As a result, a comprehensive mental state evaluation result is generated. This result is output as evaluation data.

[0706] Step 4:

[0707] Based on evaluation data, the server uses feedback provisioning mechanisms to generate specific feedback for the user. This feedback is generated as text or audio guidance and sent to the device. As a result, clear and actionable feedback is output.

[0708] Step 5:

[0709] The device visually displays the feedback received from the server and informs the user. The user reviews the feedback and takes the recommended action. During this process, the device records the user's reactions and behavioral data. As a result, an action log is output.

[0710] Step 6:

[0711] The server uses behavioral logs as a means of continuous follow-up to track the user's mental health over the long term. Based on this data, it suggests further care to the user as needed. As a result, follow-up data is output.

[0712] (Application Example 2)

[0713] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0714] In modern society, particularly in caregiving settings, effectively managing the mental health of both caregivers and clients is crucial. However, traditional methods often overlook signs of emotions and stress, making appropriate responses difficult. Therefore, there is a need for a system that can more accurately and comprehensively assess psychological states and provide appropriate care methods.

[0715] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0716] In this invention, the server includes voice conversion means, machine intelligence means, and psychological health assessment means. This enables accurate understanding of the psychological state of caregivers and users, and allows for appropriate care and feedback based on that understanding.

[0717] A "speech conversion method" is a means for analyzing speech data and detecting its characteristics, speed, and intonation.

[0718] "Machine intelligence means" refers to a method for evaluating psychological states using artificial intelligence technology based on information obtained from voice data.

[0719] A "psychological health assessment tool" is a method for quantifying and evaluating mental health based on data analyzed by machine intelligence.

[0720] "Information provision methods based on analysis results" refer to means of presenting appropriate care methods to users based on the results obtained from psychological health assessment methods.

[0721] "Continuous tracking measures" refer to methods for recording and analyzing the user's psychological state over a long period and providing additional support as needed.

[0722] This system accurately manages the mental health of caregivers and users in care settings and provides appropriate care. The system primarily consists of voice conversion, machine intelligence, psychological health assessment, information provision based on analysis results, and continuous tracking.

[0723] The server receives conversations between caregivers and users as digital audio data using a speech conversion system. The received audio data is analyzed for characteristics, speed, and intonation through the speech conversion system. Next, a machine intelligence system uses the analyzed data to evaluate the psychological state. This process utilizes software called an emotion engine to identify signs of emotion and stress in the audio with high accuracy.

[0724] The psychological health assessment tool quantitatively evaluates the stress levels and well-being of caregivers and users based on the results of emotion analysis obtained by machine intelligence. The server transmits the assessment results to the caregiver using an information provision tool based on the analysis results. This feedback is presented to the caregiver as specific care methods and relaxation guides.

[0725] The continuous tracking system is designed to record the psychological health data of users and caregivers over the long term, enabling follow-up support and new care suggestions as needed. The server continuously collects and analyzes data to provide care methods that adapt to changes in the user's psychological state.

[0726] As a concrete example, if a voice conversion device detects an increase in stress in a user's conversation, the information provision device will provide feedback to the caregiver such as, "Please suggest activities that will help the user relax."

[0727] An example of a prompt to input into the generating AI model is: "Evaluate stress and well-being from the voice data and generate specific feedback for the caregiver."

[0728] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0729] Step 1:

[0730] The device records conversations between the caregiver and the user. It acquires the audio data and sends it to the server in a digital format. In this case, the input is an audio signal, and the output is digital audio data.

[0731] Step 2:

[0732] The server analyzes the received audio data using a speech conversion device. It analyzes characteristics, speed, and intonation to generate initial emotional indicators. The input is digital audio data, and the output is analyzed data including emotional indicators.

[0733] Step 3:

[0734] The server uses machine intelligence to analyze the speech and evaluate the user's psychological state based on emotional indicators. An emotion engine is used to quantify stress levels and happiness. The input is emotional indicators, and the output is evaluation data indicating the psychological state.

[0735] Step 4:

[0736] The server generates feedback based on psychological state assessment data and analysis results using information provision methods. It constructs feedback that includes specific care methods and guidelines. The input is psychological state assessment data, and the output is a feedback message.

[0737] Step 5:

[0738] The terminal displays feedback messages received from the server to the caregiver. Based on the feedback, it suggests care methods and influences the caregiver's actions. The input is the feedback message, and the output is the caregiver's implementation of care.

[0739] Step 6:

[0740] The server uses continuous tracking to record and analyze long-term psychological health data of caregivers and users. It provides additional care suggestions as needed. The input is newly collected psychological data, and the output is updated care suggestions.

[0741] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0742] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0743] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0744] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0745] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0746] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0747] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0748] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0749] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0750] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0751] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0752] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0753] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0754] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0755] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0756] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0757] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0758] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0759] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0760] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0761] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted as being incorporated by reference.

[0762] The following is further disclosed regarding the embodiments described above.

[0763] (Claim 1)

[0764] Voice analysis means and

[0765] Generative artificial intelligence means,

[0766] Methods for assessing mental health,

[0767] A means of providing feedback based on the analysis results,

[0768] A system for managing the mental health of employees, including means of continuous follow-up.

[0769] (Claim 2)

[0770] The system according to claim 1, wherein the voice analysis means analyzes the tone, speed, and intonation of a voice to generate an index of emotion or stress.

[0771] (Claim 3)

[0772] The system according to claim 1, wherein a continuous follow-up means analyzes past mental health data and, if necessary, proposes further care to the user.

[0773] "Example 1"

[0774] (Claim 1)

[0775] A means of speech analysis,

[0776] Data generation processing means,

[0777] Health status assessment methods,

[0778] Means of providing information based on the analysis results,

[0779] A system that includes means of continuous monitoring.

[0780] (Claim 2)

[0781] The system according to claim 1, wherein the voice analysis means analyzes the characteristics of the voice and generates an index.

[0782] (Claim 3)

[0783] The system according to claim 1, wherein a continuous monitoring means analyzes past data and makes revisions as necessary.

[0784] "Application Example 1"

[0785] (Claim 1)

[0786] Voice analysis means and

[0787] Generative artificial intelligence means,

[0788] Methods for assessing mental health,

[0789] A means of providing feedback based on the analysis results,

[0790] Continuous follow-up measures,

[0791] A user device that provides visual or auditory instructions regarding mental health status based on voice analysis results,

[0792] A system that includes this.

[0793] (Claim 2)

[0794] The system according to claim 1, wherein the voice analysis means analyzes the tone, speed, and intonation of a voice to generate an index of emotion or stress.

[0795] (Claim 3)

[0796] The system according to claim 1, wherein the continuous follow-up means analyzes past mental health data and, if necessary, proposes further care to the user.

[0797] "Example 2 of combining an emotion engine"

[0798] (Claim 1)

[0799] An input means for receiving audio data,

[0800] An analysis means that recognizes an emotional state using a voice analysis means,

[0801] An evaluation method for assessing mental health status using generative artificial intelligence,

[0802] A means of generating and providing feedback based on evaluation results,

[0803] A system that tracks user behavior and includes follow-up measures to monitor their mental health over the long term.

[0804] (Claim 2)

[0805] The system according to claim 1, wherein the voice analysis means analyzes the tone, speed, and intonation of a voice to generate indicators related to emotion and stress.

[0806] (Claim 3)

[0807] The system according to claim 1, wherein the follow-up means analyzes past mental health data and proposes the most appropriate care for the user.

[0808] "Application example 2 when combining with an emotional engine"

[0809] (Claim 1)

[0810] A means of voice conversion,

[0811] Machine intelligence means and,

[0812] Psychological health assessment tools,

[0813] Means of providing information based on analysis results,

[0814] Continuous tracking measures,

[0815] A system for managing the mental health of users and suggesting care methods.

[0816] (Claim 2)

[0817] The system according to claim 1, wherein the voice conversion means analyzes the characteristics, speed, and intonation of the voice to generate indicators of psychological state and pressure.

[0818] (Claim 3)

[0819] The system according to claim 1, wherein a continuous tracking means analyzes past mental health data and, if necessary, suggests support to the user. [Explanation of symbols]

[0820] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A voice analysis method that analyzes voice data and generates indicators of emotion and stress from its tone, speed, intonation, etc. Generative artificial intelligence means, A mental health assessment tool that evaluates the user's mental health status, such as stress levels and happiness, based on the results of voice analysis and generative artificial intelligence, A means of providing feedback based on the analysis results, A system for managing the mental health of employees, including means of continuous follow-up.

2. The system according to claim 1, wherein the voice analysis means analyzes the tone, speed, and intonation of a voice to generate indicators of emotion and stress.

3. The system according to claim 1, wherein the continuous follow-up means analyzes past mental health data and, if necessary, proposes further care to the user.