system

The system addresses labor law violations and harassment in workplaces by converting audio to text, applying natural language processing to detect issues, and integrating with project management tools for timely alerts and improvements.

JP2026100753APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Modern workplaces face challenges in monitoring communication effectively, leading to overlooked labor law violations and harassment, which deteriorate mental health and work efficiency, and lack the ability to anticipate project risks.

Method used

A system utilizing speech recognition technology to convert workplace audio into text, apply natural language processing to detect violations and harassment, and generate alerts, while integrating with project management tools to monitor and suggest improvements.

Benefits of technology

Enables real-time detection and response to labor law violations and harassment, improving workplace environment and project efficiency by providing immediate alerts and corrective measures.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100753000001_ABST
    Figure 2026100753000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A method for collecting audio data in real time and converting it to text, A means for analyzing the aforementioned transcribed data to detect signs of labor law violations or harassment, A means for generating and notifying alerts based on the detected information, A means of monitoring communication and schedule risks in conjunction with project management tools, A means of proposing improvement measures based on the aforementioned risks, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including the steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern workplaces, due to the lack of proper monitoring of communication, there are situations where labor law violations and harassing remarks are easily overlooked. As a result, employees' dissatisfaction accumulates, leading to deterioration of mental health and a high turnover rate. Also, the inability to anticipate potential risks during project progress is a factor contributing to a decline in work efficiency and a deterioration of the working environment. There is a need to solve these problems and realize an efficient and healthy workplace environment.

Means for Solving the Problems

[0005] This invention provides a means of accurately recording workplace communication by using speech recognition technology that collects voice data in real time and converts it into text. Furthermore, it constructs a system that applies natural language processing to this text data to analyze and detect signs of labor law violations and harassment. Based on the detected information, it generates alerts and notifies the responsible person. In addition, by linking with project management tools, it monitors communication and schedule risks in ongoing projects and proposes corrective measures for detected risks, thereby enabling rapid response and improvement of the workplace environment.

[0006] "Audio data" refers to acoustic signals collected as speeches or conversations of individuals or groups, and serves as the basic data for processing and converting them into text.

[0007] "Text conversion" is the process of analyzing audio signals and converting them into textual information, and is carried out using speech recognition technology.

[0008] "Violation of labor laws" refers to actions or situations that violate the Labor Standards Act and other related laws and regulations, and infringe upon the rights of employees.

[0009] "Harassment" refers to inappropriate words or actions directed at a specific individual or group, based on one's position or interpersonal relationships, that cause psychological or physical distress.

[0010] An "alert" refers to a warning system that immediately notifies the responsible person when it detects a specific anomaly or event requiring attention.

[0011] A "project management tool" refers to software used to manage multiple tasks and schedules, and to efficiently execute a project.

[0012] "Natural language processing" is a technology that uses computers to analyze, understand, and generate human language, and is a method used for analyzing text data.

[0013] "Risk" refers to problems or obstacles that may arise during the progress of a project or in workplace activities, which could negatively impact the efficiency or results of work. [Brief explanation of the drawing]

[0014] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.

Embodiments for Carrying Out the Invention

[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0016] First, the terms used in the following description will be explained.

[0017] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0018] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0019] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0020] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0022] [First Embodiment]

[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0035] This invention provides a system for monitoring workplace communication and preventing labor law violations and harassment. This system is implemented in the following manner:

[0036] First, the server acquires audio data from devices that collect workplace conversations. This data is collected in real time through microphones installed in the office and video conferencing systems. The server then uses a speech recognition engine to transcribe this audio data into text. This transcription process provides the foundational data needed to analyze potentially problematic statements.

[0037] Next, the server analyzes the transcribed data using natural language processing capabilities. During this analysis, algorithms are used to detect keywords that may violate labor laws and expressions that could constitute harassment. This ensures that any problematic statements are immediately recognized.

[0038] Based on the detected information, the server generates an appropriate alert. This alert is sent to the terminals of the relevant administrators and personnel, providing real-time notification. This enables a rapid response.

[0039] Furthermore, the server communicates with project management tools used within the workplace to monitor communication and scheduling risks related to project progress. For detected risks, alerts and specific improvement suggestions are sent to the user's device. Based on these suggestions, users can readjust project member assignments and schedules, supporting efficient work operations.

[0040] As a concrete example, let's say a project team is having an ongoing meeting. The server transcribes the comments made during the meeting into text, and if the analysis detects any "unreasonable demands" or "harassing language," it immediately notifies the team leader's terminal with a message saying "attention required." Through this process, it is possible to minimize the impact of comments on work and improve the workplace environment.

[0041] The following describes the processing flow.

[0042] Step 1:

[0043] The server acquires audio data in real time from audio collection devices within the office. This includes microphones in conference rooms and digital conversation recording systems.

[0044] Step 2:

[0045] The server converts the acquired audio data into text data using a speech recognition engine. In this process, the audio signal is analyzed, and highly accurate text conversion is performed.

[0046] Step 3:

[0047] The server analyzes the transcribed data using natural language processing algorithms. Here, it identifies specific keywords and phrases and detects contexts that may be related to labor law violations or harassment.

[0048] Step 4:

[0049] If the server detects a problem based on the analysis results, it will immediately generate an alert message. This alert will include specific information about what the problem is and which statements require attention.

[0050] Step 5:

[0051] The server sends the generated alerts to the administrator's or designated person's terminal. Notifications are provided in real time via push notifications or a dedicated dashboard.

[0052] Step 6:

[0053] The server simultaneously accesses project management tools to check the schedule and resource status of ongoing projects. If necessary, it identifies relevant risk factors.

[0054] Step 7:

[0055] The server generates project improvement suggestions based on the detected risks. These suggestions may include adding new members, revising processes, and adjusting the schedule.

[0056] Step 8:

[0057] Users can review alerts and suggestions received on their devices and select and implement appropriate countermeasures. This enables early problem resolution and efficient project management.

[0058] (Example 1)

[0059] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0060] In workplace communication, labor law violations and harassment are extremely important issues, and early detection and countermeasures are required. However, traditional monitoring methods have the challenge of not being able to detect problems in real time. Furthermore, there are difficulties in appropriately monitoring and immediately responding to communication and scheduling risks that arise as projects progress.

[0061] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0062] In this invention, the server includes means for acquiring and converting voice information into text in real time; means for analyzing the converted information and detecting signs of legal violations or disruptive behavior; means for creating and notifying warnings based on the detected content; means for monitoring communication and planning risks in cooperation with a business management system; and means for suggesting improvements based on the risks. This makes it possible to detect signs of potential legal violations and harassment occurring in the workplace in real time and notify managers with appropriate alerts. Furthermore, by coordinating with project management tools, it becomes possible to immediately discover communication and scheduling risks and support efficient business operations.

[0063] "Audio information" refers to data, including conversations and sounds, collected in the workplace.

[0064] "Real-time acquisition" refers to the process of collecting audio information almost simultaneously with its generation.

[0065] "String conversion" is the process of converting audio information into text data.

[0066] "Analysis" is the process of understanding meaning and intent based on information that has been converted into a string.

[0067] "Signs of legal violation" refers to any conduct that may violate labor laws or related regulations.

[0068] "Disruptive behavior" refers to problematic actions in the workplace, including harassment and inappropriate remarks.

[0069] A "warning" is a notification generated to address detected anomalies or problems.

[0070] "Notification" is the act of conveying warnings or information to relevant parties.

[0071] A "business management system" is a digital tool used to manage the progress of a project and related tasks.

[0072] "Communication risks" refer to the risks inherent in business communication.

[0073] "Planning risks" refer to risks related to the project's schedule and progress.

[0074] A "suggestion for improvement" is a specific solution to the identified risks or problems.

[0075] This invention provides a system for efficiently monitoring workplace communication, preventing labor law violations and disruptive behavior, and effectively managing projects.

[0076] The server works in conjunction with multiple audio collection devices to collect audio information in real time, whether within the office or through digital conferencing systems. Specific examples of such devices include network-enabled microphones and video conferencing systems. The audio information is transmitted to the server and converted into text using audio conversion technology. For example, speech recognition technologies such as Google® Cloud Speech-to-Text API can be utilized.

[0077] Next, the server applies natural language processing algorithms to the stringified information to detect potential legal violations or disruptive behavior. This utilizes generative AI models to extract specific keywords and phrases. If a problem is detected as a result of this analysis, the server immediately generates a warning and notifies the relevant parties' terminals.

[0078] Furthermore, the server connects to the business management systems used by the company (e.g., Asana or Jira) to monitor project communications and planning risks. This allows users to detect potential problems in project progress early and respond quickly based on improvement suggestions provided along with alerts.

[0079] As a concrete example, during a meeting for an ongoing project, the server transcribes the user's statements in real time and sends a "warning" to the team leader's terminal if it contains "inappropriate requests" or "inappropriate expressions." This allows the user to immediately review their statements and prevent workplace problems.

[0080] An example of a prompt message might be, "Explain how to retrieve workplace conversation data, analyze inappropriate remarks, and generate alerts."

[0081] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0082] Step 1:

[0083] The server acquires audio information in real time from various audio acquisition devices within the workplace. This input includes network-enabled microphones and video conferencing systems. The server receives the audio information from these devices as streaming data and temporarily stores it in a compressed format. During this process, the audio signals are properly captured and prepared for processing.

[0084] Step 2:

[0085] The server converts the acquired audio information into text using acoustic conversion technology. This conversion employs a model based on speech recognition technology. The input is an audio signal, and the output as a result of the processing is text data. For this text conversion process, the server utilizes the Google Cloud Speech-to-Text API and other tools to convert audio to text in real time and store it in a database.

[0086] Step 3:

[0087] The server analyzes the stringified text data to detect keywords and phrases that may indicate illegal or disruptive behavior. The input is the text data obtained in step 2, and the output is the analysis results indicating the problem. The server performs this analysis based on natural language processing algorithms and generative AI models, and compares it against a keyword list. By identifying specific patterns, it detects abnormal statements.

[0088] Step 4:

[0089] The server generates warnings based on detected anomalies and problems and notifies the relevant parties' terminals. The input is the analysis results from step 3, and the output is the warning message. The server responds immediately to problematic statements and sends warnings to relevant parties via email or messaging applications. In this process, it can also provide specific action guidelines to support rapid decision-making.

[0090] Step 5:

[0091] The server works in conjunction with the project management system to monitor project communication and planning risks. Its inputs include up-to-date data from the project management tool, and its outputs are risk assessments and improvement suggestions. The server retrieves information from the project management system via an API and analyzes it to identify potential risks. Based on these risks, specific improvement suggestions are generated and provided to the user.

[0092] (Application Example 1)

[0093] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0094] In offices and workplaces, unintentional statements that violate laws or ethics can occur during communication. Such statements can worsen the work environment and damage the organization's credibility, thus requiring real-time problem detection and rapid response. Furthermore, accurately preserving communication history is crucial to prevent future disputes and to serve as evidence if problems arise. Effective technology is needed to address these challenges.

[0095] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0096] In this invention, the server includes means for acquiring voice information in real time and converting it into text information, means for analyzing the converted text information and detecting signs of human rights or ethical issues, and means for generating and notifying warnings based on the detected information. This makes it possible to quickly detect legal or ethical issues that may occur in workplace communication and immediately notify administrators.

[0097] "Audio information" refers to all data acquired through audio, including linguistic content collected in real time.

[0098] "Textual information" refers to data that represents audio information as text, and includes strings of language extracted using speech recognition technology.

[0099] "Analysis" refers to the process of analyzing acquired textual information and identifying problems in light of specific legal or ethical standards.

[0100] A "warning" refers to an alert or notification generated based on a detected problem, intended to quickly inform administrators and relevant individuals.

[0101] A "project management tool" refers to an entire system used to monitor the progress of a project or task, and to manage related information and schedules.

[0102] "Risk" refers to the risk arising from statements or actions that violate laws or ethics, and means the potential for the working environment or the health of the organization to be compromised as a result.

[0103] "Terminal" refers to a device used by a user to receive or manipulate information, and includes personal computers and smartphones.

[0104] The system implementing this invention aims to utilize voice information in an office environment to perform real-time monitoring and analysis in order to maintain healthy communication within the workplace.

[0105] The server first acquires audio information through speech recognition devices installed in the office or workplace. This audio information is then converted into text using Google Cloud Speech-to-Text or similar speech recognition technology. This digitizes the content of the conversation, making it available for subsequent analysis.

[0106] Next, this textual information is analyzed using a natural language processing engine (such as SpaCy or NLTK) on the server. Here, specific keywords and phrases are detected against pre-defined legal and ethical standards, and potential problems are identified. These processes make it possible to evaluate whether the content of workplace communication is appropriate.

[0107] When a problem is detected, the server immediately generates an alert and notifies the device. This notification is sent to the devices of administrators and relevant personnel using Firebase Cloud Messaging or similar push notification technology. This allows administrators to respond to workplace issues in real time.

[0108] Furthermore, the terminal can receive suggestions from the server and, if necessary, present measures to improve communication. This allows users to improve their work environment and support effective business operations.

[0109] For example, if an inappropriate request is made during a meeting, the server detects it and sends a warning to the administrator's terminal stating that "the content of the statement may be inappropriate." At this time, the user is presented with a prompt such as "Please suggest a better way to communicate to resolve this issue," and specific improvement suggestions are provided by a generative AI model.

[0110] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0111] Step 1:

[0112] The server acquires audio information from speech recognition devices installed within the workplace. This input audio information covers the entirety of conversations within the workplace. Based on this, speech recognition technology is used to process it in real time and convert it into text information. The converted text information becomes the input for the next step.

[0113] Step 2:

[0114] The server receives the character information converted in Step 1 as input and performs analysis using a natural language processing engine. Here, the data is processed to detect keywords and phrases that correspond to pre-defined laws and ethical standards. The output is a result that determines whether a problem has been detected. This analysis result indicates the potential presence or absence of a problem.

[0115] Step 3:

[0116] Based on the analysis results from Step 2, the server generates an alert if a problem is detected. The specific nature and severity of the problem are taken into consideration when generating this alert. The generated alert is sent to the device using a push notification service. The device receives the alert and notifies the administrator in real time. The alert message is then displayed on the administrator's device as output.

[0117] Step 4:

[0118] The terminal receives a notification sent from the server and informs the user. After the user acknowledges this warning, they wait for instructions regarding the improvement suggestions provided by the server. These improvement suggestions include specific actions the user should take to improve the work environment. The system uses a generative AI model to provide specific improvement suggestions based on the prompt. This output includes improvement actions that the user can actually implement.

[0119] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0120] This invention is a system that combines an emotion engine that not only detects potential problems in workplace communication but also recognizes the user's emotional state. In addition to conventional problem detection functions, this system aims to identify psychological risks in the workplace and provide comprehensive improvement measures.

[0121] First, the server collects voice and text data from within the workplace in real time. This includes conversations in meeting rooms and offices, which are then converted into text data using speech recognition technology. This transcribed data is then subjected to subsequent analysis processing.

[0122] Next, the server applies natural language processing to the text data to detect signs of labor law violations and harassment. In addition, it utilizes an emotion engine to analyze the emotional state associated with each user's statements. The emotion engine identifies emotions such as joy, anger, and sadness from word choices and context, and adds this information to the analysis results.

[0123] If the analysis detects a problem, the server generates an alert and notifies the relevant administrators and personnel on their terminals. The alert includes details of the identified problem and the detected emotional tendencies. This allows for a multifaceted assessment of the situation and enables a quick and appropriate response.

[0124] Furthermore, the server integrates with project management tools to monitor emotional and communication risks that could impact project progress. Based on this information, it proposes improvements to the project. For example, if a team member is identified as being under high stress, suggestions may be made to re-evaluate the project member's role or adjust their workload.

[0125] Based on the information displayed on the terminal, users can make appropriate decisions tailored to their individual circumstances and take necessary measures. In this way, the main objective of the present invention is to support improvements not only to the physical but also to the psychological environment of the workplace, thereby improving employee health and corporate efficiency.

[0126] The following describes the processing flow.

[0127] Step 1:

[0128] The server acquires audio data in real time from voice collection devices installed within the workplace. This data, including meetings and everyday conversations, is securely stored for recording purposes.

[0129] Step 2:

[0130] The server converts the acquired audio data into text using a speech recognition engine. Here, it prioritizes transcribing the clearest parts of the audio and performs filtering to minimize transcription errors.

[0131] Step 3:

[0132] The server analyzes the transcribed data using natural language processing algorithms to detect signs of labor law violations and harassment. It also uses an emotion engine to identify the emotions contained in the statements and generate an emotion profile.

[0133] Step 4:

[0134] The server generates alerts based on detected issues and emotional states. These alerts include the type of issue identified, the emotional tendency, and related contextual information.

[0135] Step 5:

[0136] The server notifies relevant parties or administrators of the generated alerts in real time. The notifications are delivered via push notifications, prompting immediate action based on the level of importance.

[0137] Step 6:

[0138] The server integrates with project management tools to monitor emotional and communication risks in project progress and resource allocation. This includes regular data updates and real-time insights.

[0139] Step 7:

[0140] Based on the analyzed sentiment and risk data, the server proposes improvements and adjustments. These proposals may include recommendations for reviewing team composition or adjusting schedules.

[0141] Step 8:

[0142] Users review received alerts and suggestions and determine the best course of action based on them. Users can provide appropriate feedback and mental support in accordance with organizational policies.

[0143] (Example 2)

[0144] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0145] In today's workplace, it is essential to detect labor law violations and signs of harassment early and respond appropriately. However, these problems often do not manifest themselves overtly, and in many cases, they have underlying, complex emotional roots. Traditional methods have made it difficult to accurately identify these psychological risks and respond quickly. In addition, there is insufficient understanding and countermeasures for communication risks that affect project progress, which can ultimately reduce workplace efficiency.

[0146] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0147] In this invention, the server includes means for collecting acoustic information in real time and converting it into text data; means for analyzing the text data and detecting signs of regulatory violations or misconduct; means for analyzing an individual's emotional state and identifying psychological risks; and means for monitoring communication and progress risks in conjunction with a work management tool. This makes it possible to analyze potential workplace problems from multiple perspectives and propose appropriate improvement measures.

[0148] "Acoustic information" refers to data related to speech and other sounds that is collected and processed by a system.

[0149] "Text data" refers to data in text format converted from acoustic information, and is the information that will be analyzed.

[0150] "Analysis" refers to the process of data processing carried out to detect signs of regulatory violations or misconduct based on collected data.

[0151] "Indicators of regulatory violations or misconduct" refers to indications of actions or statements that may violate labor laws or other norms.

[0152] "Warning information" refers to alerts generated to notify about detected problems.

[0153] A "task management tool" refers to a software system used to manage projects and their progress.

[0154] "Communication and progress risks" refer to communication problems and risks that may affect the progress of the project.

[0155] "Emotional state" refers to an individual's psychological or emotional state, which is analyzed from their words and actions.

[0156] "Psychological risk" refers to problems in the workplace that may affect an individual's mental health.

[0157] This invention is a system that effectively collects and analyzes acoustic information in the workplace environment, enabling early detection of signs of regulatory violations and misconduct, as well as the evaluation of individuals' emotional states and the identification of psychological risks. At the core of this system is the comprehensive data processing capability achieved through the cooperation of servers and terminals.

[0158] The server first collects real-time acoustic information from the workplace through acoustic sensors and microphones. This function is achieved using microphone devices connected via the network and dedicated speech recognition devices. Specifically, speech recognition software such as Google Speech-to-Text API and IBM Watson® Speech to Text is used to instantly convert the collected acoustic information into text data. This text data is stored on the server as the basis for analysis.

[0159] Next, the server runs a program to perform natural language analysis based on the stored text data. This analysis uses advanced natural language processing techniques such as Python's NLTK library and spaCy. During the analysis, the server detects signs of labor law violations and harassment and generates warning information as needed. The analysis also incorporates an emotion engine that calculates an individual's emotional state from the context of the conversation and word choice, and identifies psychological risks.

[0160] Furthermore, the server monitors project progress in conjunction with work management tools. Specifically, it uses APIs to connect data with project management software such as JIRA and Trello to detect and analyze communication and progress risks. Based on this data, the system proposes appropriate improvement measures to administrators. For example, if a particular team member is detected to be under high stress, it can suggest redistributing tasks.

[0161] The user receives information provided by the server through the terminal and takes appropriate action based on it. The terminal is equipped with a user interface for visualizing data analysis results and suggested improvement measures, allowing the user to quickly implement countermeasures.

[0162] For example, the effectiveness of the system can be further enhanced by utilizing a generative AI model with prompts such as, "How can I detect emotions and problems from workplace audio data?" In this way, the system aims to support improvements in the physical and psychological environment of the workplace, thereby improving employee health and corporate efficiency.

[0163] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0164] Step 1:

[0165] The server collects acoustic information in real time from acoustic sensors and microphones. The input is audio data from conference rooms or offices, and the output is stored in a database as acoustic information. In this process, noise cancellation technology is used to remove ambient noise and capture conversations more clearly.

[0166] Step 2:

[0167] The server converts the collected acoustic information into text data using speech recognition software. The input is acoustic information, and the output is text data. Specifically, when converting speech to text using the Google Speech-to-Text API, the server utilizes speaker identification to classify each utterance.

[0168] Step 3:

[0169] The server applies natural language processing to the converted text data to detect signs of regulatory violations or misconduct. The input is text data, and the output is the analysis result regarding the presence or absence of problems. It also utilizes an emotion engine to analyze the emotional state contained in each statement, identifying emotions such as joy, anger, and sadness. The data processing performed at this stage involves analyzing the text sentence by sentence and measuring the frequency of keywords and phrases.

[0170] Step 4:

[0171] The server generates warning information based on the analysis results and notifies the terminals of relevant administrators and personnel. The input is the analysis results, and the output is the alert notification. Here, notifications are sent via email or a dedicated app using a notification system, and delivered to relevant parties in different ways depending on the urgency.

[0172] Step 5:

[0173] The server works in conjunction with the work management tool to continuously monitor the risks in the project's progress and propose improvement measures. The input is the latest progress information obtained from the project management tool, and the output is a risk assessment and improvement suggestions. Specifically, it periodically synchronizes the project status using APIs such as JIRA, visualizes the data, and provides it as information that users can easily understand and use.

[0174] Step 6:

[0175] The user takes appropriate action based on the warning information and improvement suggestions displayed on the device. The input is notification information received from the server, and the output is the user's implementation of the corrective action. In this process, the user interface receives data feedback, presents recommended actions, and supports specific action procedures tailored to the situation.

[0176] (Application Example 2)

[0177] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0178] Traditional workplace management systems can detect labor law violations and signs of harassment, but they struggle to recognize employees' emotional states in real time and utilize that information to provide comprehensive environmental improvement measures. Furthermore, they lacked the ability to comprehensively assess and respond quickly to communication risks and emotional aspects related to project progress, resulting in insufficient improvement of the psychological workplace environment.

[0179] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0180] In this invention, the server includes means for collecting voice information in real time and converting it into text data, means for analyzing the text data and identifying signs of legal violations or harassment, and means for recognizing the user's emotional state using an emotion analysis engine. This makes it possible to comprehensively monitor psychological and legal risks in the workplace and provide appropriate countermeasures.

[0181] "Auditory information" refers to information conveyed through sound, and is typically recorded as data that includes human speech.

[0182] "Text data" refers to digital data that represents audio information as text, conveying information in a format that is easy for people to understand.

[0183] "Analysis" is the process of examining data in detail, identifying features and patterns, and deriving specific results.

[0184] "Violation of laws and regulations" refers to actions that violate the provisions of laws or regulations, and such actions may result in legal sanctions.

[0185] "Signs of harassment" refer to early signs that bullying or harassment may be occurring, and situations that should be addressed based on these signs.

[0186] An "emotion analysis engine" is a software process that identifies a user's emotions from text or audio, and has the function of automatically classifying their emotional state.

[0187] "Users" refers to individuals or groups who use the system or service, and in this context specifically refers to employees of a workplace.

[0188] A "warning" is a notification issued to alert people to potential dangers or inappropriate behavior, and is intended to encourage a prompt response.

[0189] "Notification" refers to the act of sending messages or alerts used to inform users of specific information.

[0190] A "project management system" is a digital tool or platform for tracking and managing the planning, progress, and outcomes of a project.

[0191] "Communication risk" refers to a situation where misunderstandings or communication errors may occur during information exchange, potentially hindering the progress of the project.

[0192] "Risks in the project progress" refers to situations where the project may not proceed as planned, and the results may not be achieved as intended.

[0193] An "improvement proposal" is a suggestion that identifies a problem and then outlines specific methods or measures to solve it.

[0194] The system for implementing this invention collects voice information in the workplace in real time, recognizes the emotional state of employees using an emotion analysis engine, and analyzes signs of legal violations or harassment. Specifically, these functions are realized by a server-centered system.

[0195] The server collects audio information from within the workplace in real time via microphones and audio devices. This audio information is converted into text data using speech recognition technologies such as Google Cloud Speech-to-Text and IBM Watson Speech-to-Text.

[0196] Next, the server performs natural language processing on this text data. Using natural language processing tools such as AWS® Comprehend and Google Cloud Natural Language, it identifies signs of legal violations and harassment within the text and analyzes the user's emotional state. This includes automatically categorizing emotions such as joy, anger, and sadness.

[0197] When notifying users of necessary information on their devices, the system also notifies administrators of specific problems or changes in emotional state on their devices. This allows administrators to monitor employees' psychological health and take appropriate action.

[0198] As a concrete example, consider a scenario where, during a weekly meeting, an emotional analysis of project team members reveals that a particular member is experiencing extremely high stress levels. The system then alerts the administrator, who can quickly address the issue by reviewing their workload.

[0199] An example of a prompt using a generative AI model is, "Analyze the emotional state of the participants during the meeting and report the stress index." Using this prompt helps to facilitate the smooth progress of the emotion recognition process.

[0200] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0201] Step 1:

[0202] The server collects audio information in real time via audio devices within the workplace. The input for this step is the audio signal acquired from the microphone, and the output is digitized audio data. This data is stored on the server for further processing.

[0203] Step 2:

[0204] The server converts the collected audio data into text data using speech recognition technology such as Google Cloud Speech-to-Text. The input for this step is digitized audio data, and the output is text data. The speech recognition algorithm records the content of the audio as text information.

[0205] Step 3:

[0206] The server analyzes the text data using natural language processing tools such as AWS Comprehend. The input for this step is data expressed in text format, and the output is the analyzed information. Specifically, the analysis is performed to identify indicators of emotional state and signs of legal violations.

[0207] Step 4:

[0208] The server generates and sends an alert to the administrator's terminal based on the analyzed information. The input for this step is the information obtained as a result of the analysis, and the output is the alert message sent to the administrator's terminal. This allows the administrator to immediately grasp any abnormalities in the work environment.

[0209] Step 5:

[0210] The user reviews the alerts and analysis information received via the terminal and takes necessary actions. The input for this step is the notification information displayed on the terminal, and the output is the user's response and feedback. Based on the suggested improvements, the user reviews their work processes and adjusts their work environment.

[0211] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0212] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0213] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0214] [Second Embodiment]

[0215] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0216] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0217] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0218] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0219] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0220] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0221] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0222] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0223] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0224] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0225] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0226] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0227] This invention provides a system for monitoring workplace communication and preventing labor law violations and harassment. This system is implemented in the following manner:

[0228] First, the server acquires audio data from devices that collect workplace conversations. This data is collected in real time through microphones installed in the office and video conferencing systems. The server then uses a speech recognition engine to transcribe this audio data into text. This transcription process provides the foundational data needed to analyze potentially problematic statements.

[0229] Next, the server analyzes the transcribed data using natural language processing capabilities. During this analysis, algorithms are used to detect keywords that may violate labor laws and expressions that could constitute harassment. This ensures that any problematic statements are immediately recognized.

[0230] Based on the detected information, the server generates an appropriate alert. This alert is sent to the terminals of the relevant administrators and personnel, providing real-time notification. This enables a rapid response.

[0231] Furthermore, the server communicates with project management tools used within the workplace to monitor communication and scheduling risks related to project progress. For detected risks, alerts and specific improvement suggestions are sent to the user's device. Based on these suggestions, users can readjust project member assignments and schedules, supporting efficient work operations.

[0232] As a concrete example, let's say a project team is having an ongoing meeting. The server transcribes the comments made during the meeting into text, and if the analysis detects any "unreasonable demands" or "harassing language," it immediately notifies the team leader's terminal with a message saying "attention required." Through this process, it is possible to minimize the impact of comments on work and improve the workplace environment.

[0233] The following describes the processing flow.

[0234] Step 1:

[0235] The server acquires audio data in real time from audio collection devices within the office. This includes microphones in conference rooms and digital conversation recording systems.

[0236] Step 2:

[0237] The server converts the acquired audio data into text data using a speech recognition engine. In this process, the audio signal is analyzed, and highly accurate text conversion is performed.

[0238] Step 3:

[0239] The server analyzes the transcribed data using natural language processing algorithms. Here, it identifies specific keywords and phrases and detects contexts that may be related to labor law violations or harassment.

[0240] Step 4:

[0241] If the server detects a problem based on the analysis results, it will immediately generate an alert message. This alert will include specific information about what the problem is and which statements require attention.

[0242] Step 5:

[0243] The server sends the generated alerts to the administrator's or designated person's terminal. Notifications are provided in real time via push notifications or a dedicated dashboard.

[0244] Step 6:

[0245] The server simultaneously accesses project management tools to check the schedule and resource status of ongoing projects. If necessary, it identifies relevant risk factors.

[0246] Step 7:

[0247] The server generates project improvement suggestions based on the detected risks. These suggestions may include adding new members, revising processes, and adjusting the schedule.

[0248] Step 8:

[0249] Users can review alerts and suggestions received on their devices and select and implement appropriate countermeasures. This enables early problem resolution and efficient project management.

[0250] (Example 1)

[0251] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0252] In workplace communication, labor law violations and harassment are extremely important issues, and early detection and countermeasures are required. However, traditional monitoring methods have the challenge of not being able to detect problems in real time. Furthermore, there are difficulties in appropriately monitoring and immediately responding to communication and scheduling risks that arise as projects progress.

[0253] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0254] In this invention, the server includes means for acquiring and converting voice information into text in real time; means for analyzing the converted information and detecting signs of legal violations or disruptive behavior; means for creating and notifying warnings based on the detected content; means for monitoring communication and planning risks in cooperation with a business management system; and means for suggesting improvements based on the risks. This makes it possible to detect signs of potential legal violations and harassment occurring in the workplace in real time and notify managers with appropriate alerts. Furthermore, by coordinating with project management tools, it becomes possible to immediately discover communication and scheduling risks and support efficient business operations.

[0255] "Audio information" refers to data, including conversations and sounds, collected in the workplace.

[0256] "Real-time acquisition" refers to the process of collecting audio information almost simultaneously with its generation.

[0257] "String conversion" is the process of converting audio information into text data.

[0258] "Analysis" is the process of understanding meaning and intent based on information that has been converted into a string.

[0259] "Signs of legal violation" refers to any conduct that may violate labor laws or related regulations.

[0260] "Disruptive behavior" refers to problematic actions in the workplace, including harassment and inappropriate remarks.

[0261] A "warning" is a notification generated to address detected anomalies or problems.

[0262] "Notification" is the act of conveying warnings or information to relevant parties.

[0263] A "business management system" is a digital tool used to manage the progress of a project and related tasks.

[0264] "Communication risks" refer to the risks inherent in business communication.

[0265] "Planning risks" refer to risks related to the project's schedule and progress.

[0266] A "suggestion for improvement" is a specific solution to the identified risks or problems.

[0267] This invention provides a system for efficiently monitoring workplace communication, preventing labor law violations and disruptive behavior, and effectively managing projects.

[0268] The server works in conjunction with multiple audio collection devices to collect audio information in real time, both within the office and through digital conferencing systems. Specific examples of such devices include network-enabled microphones and video conferencing systems. The audio information is transmitted to the server and converted into text using audio conversion technology. For example, speech recognition technologies such as the Google Cloud Speech-to-Text API can be utilized.

[0269] Next, the server applies natural language processing algorithms to the stringified information to detect potential legal violations or disruptive behavior. This utilizes generative AI models to extract specific keywords and phrases. If a problem is detected as a result of this analysis, the server immediately generates a warning and notifies the relevant parties' terminals.

[0270] Furthermore, the server connects to the business management systems used by the company (e.g., Asana or Jira) to monitor project communications and planning risks. This allows users to detect potential problems in project progress early and respond quickly based on improvement suggestions provided along with alerts.

[0271] As a concrete example, during a meeting for an ongoing project, the server transcribes the user's statements in real time and sends a "warning" to the team leader's terminal if it contains "inappropriate requests" or "inappropriate expressions." This allows the user to immediately review their statements and prevent workplace problems.

[0272] An example of a prompt message might be, "Explain how to retrieve workplace conversation data, analyze inappropriate remarks, and generate alerts."

[0273] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0274] Step 1:

[0275] The server acquires audio information in real time from various audio acquisition devices within the workplace. This input includes network-enabled microphones and video conferencing systems. The server receives the audio information from these devices as streaming data and temporarily stores it in a compressed format. During this process, the audio signals are properly captured and prepared for processing.

[0276] Step 2:

[0277] The server converts the acquired audio information into text using acoustic conversion technology. This conversion employs a model based on speech recognition technology. The input is an audio signal, and the output as a result of the processing is text data. For this text conversion process, the server utilizes the Google Cloud Speech-to-Text API and other tools to convert audio to text in real time and store it in a database.

[0278] Step 3:

[0279] The server analyzes the stringified text data to detect keywords and phrases that may indicate illegal or disruptive behavior. The input is the text data obtained in step 2, and the output is the analysis results indicating the problem. The server performs this analysis based on natural language processing algorithms and generative AI models, and compares it against a keyword list. By identifying specific patterns, it detects abnormal statements.

[0280] Step 4:

[0281] The server generates warnings based on detected anomalies and problems and notifies the relevant parties' terminals. The input is the analysis results from step 3, and the output is the warning message. The server responds immediately to problematic statements and sends warnings to relevant parties via email or messaging applications. In this process, it can also provide specific action guidelines to support rapid decision-making.

[0282] Step 5:

[0283] The server monitors the communication and planning risks of a project in cooperation with the business management system. This input includes the latest data of the project management tool, and the output is a risk assessment and improvement suggestions. The server obtains information from the business management system through the API and performs analysis to identify potential risks. Based on this risk, specific improvement suggestions are generated and provided to the user.

[0284] (Application Example 1)

[0285] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0286] In offices and workplaces, there is a possibility that statements violating laws or ethics may be made unintentionally during communication. Since these statements may deteriorate the working environment and damage the organization's reliability, real-time problem detection and prompt response are required. Also, accurately storing the communication history is important for preventing future disputes and using it as evidence when problems occur. An effective technology for solving these problems is needed.

[0287] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0288] In this invention, the server includes means for acquiring voice information in real time and converting it into character information, means for analyzing the converted character information and detecting signs of problems related to human rights and ethics, and means for generating and notifying a warning based on the detected information. Thereby, it becomes possible to quickly detect problems violating laws or ethics that may occur during communication in the workplace and immediately notify the administrator.

[0289] "Voice information" refers to the entire data acquired through voice and includes the linguistic content collected in real time.

[0290] "Textual information" refers to data that represents audio information as text, and includes strings of language extracted using speech recognition technology.

[0291] "Analysis" refers to the process of analyzing acquired textual information and identifying problems in light of specific legal or ethical standards.

[0292] A "warning" refers to an alert or notification generated based on a detected problem, intended to quickly inform administrators and relevant individuals.

[0293] A "project management tool" refers to an entire system used to monitor the progress of a project or task, and to manage related information and schedules.

[0294] "Risk" refers to the risk arising from statements or actions that violate laws or ethics, and means the potential for the working environment or the health of the organization to be compromised as a result.

[0295] "Terminal" refers to a device used by a user to receive or manipulate information, and includes personal computers and smartphones.

[0296] The system implementing this invention aims to utilize voice information in an office environment to perform real-time monitoring and analysis in order to maintain healthy communication within the workplace.

[0297] The server first acquires audio information through speech recognition devices installed in the office or workplace. This audio information is then converted into text using Google Cloud Speech-to-Text or similar speech recognition technology. This digitizes the content of the conversation, making it available for subsequent analysis.

[0298] Next, this textual information is analyzed using a natural language processing engine (such as SpaCy or NLTK) on the server. Here, specific keywords and phrases are detected against pre-defined legal and ethical standards, and potential problems are identified. These processes make it possible to evaluate whether the content of workplace communication is appropriate.

[0299] When a problem is detected, the server immediately generates an alert and notifies the device. This notification is sent to the devices of administrators and relevant personnel using Firebase Cloud Messaging or similar push notification technology. This allows administrators to respond to workplace issues in real time.

[0300] Furthermore, the terminal can receive suggestions from the server and, if necessary, present measures to improve communication. This allows users to improve their work environment and support effective business operations.

[0301] For example, if an inappropriate request is made during a meeting, the server detects it and sends a warning to the administrator's terminal stating that "the content of the statement may be inappropriate." At this time, the user is presented with a prompt such as "Please suggest a better way to communicate to resolve this issue," and specific improvement suggestions are provided by a generative AI model.

[0302] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0303] Step 1:

[0304] The server acquires audio information from speech recognition devices installed within the workplace. This input audio information covers the entirety of conversations within the workplace. Based on this, speech recognition technology is used to process it in real time and convert it into text information. The converted text information becomes the input for the next step.

[0305] Step 2:

[0306] The server acquires the character information converted in step 1 as input and performs analysis using a natural language processing engine. Here, data processing is performed to detect keywords and phrases corresponding to pre-set laws and ethical standards. As output, a result for determining whether a problem has been detected is obtained. This analysis result indicates the potential presence or absence of a problem.

[0307] Step 3:

[0308] Based on the analysis result of step 2, the server generates a warning if a problem is detected. When generating this warning, the specific content and importance of the problem are considered. The generated warning is sent to the terminal using a push notification service. At this time, the terminal receives the warning and notifies the administrator in real time. As output, a warning message is displayed on the administrator's device.

[0309] Step 4:

[0310] The terminal receives the notification sent from the server and notifies the user. After the user confirms this warning, the user waits for instructions regarding the improvement proposal presented by the server. This improvement proposal includes specific actions that the user should take for the purpose of improving the workplace environment. The system uses a generative AI model to provide a specific improvement plan based on the prompt text. This output includes improvement actions that the user can actually implement.

[0311] Furthermore, an emotion engine for estimating the user's emotion may be combined. That is, the specific processing unit 290 may estimate the user's emotion using the emotion identification model 59 and perform specific processing using the user's emotion.

[0312] This invention is a system that combines an emotion engine that not only detects potential problems in workplace communication but also recognizes the user's emotional state. In addition to conventional problem detection functions, this system aims to identify psychological risks in the workplace and provide comprehensive improvement measures.

[0313] First, the server collects voice and text data from within the workplace in real time. This includes conversations in meeting rooms and offices, which are then converted into text data using speech recognition technology. This transcribed data is then subjected to subsequent analysis processing.

[0314] Next, the server applies natural language processing to the text data to detect signs of labor law violations and harassment. In addition, it utilizes an emotion engine to analyze the emotional state associated with each user's statements. The emotion engine identifies emotions such as joy, anger, and sadness from word choices and context, and adds this information to the analysis results.

[0315] If the analysis detects a problem, the server generates an alert and notifies the relevant administrators and personnel on their terminals. The alert includes details of the identified problem and the detected emotional tendencies. This allows for a multifaceted assessment of the situation and enables a quick and appropriate response.

[0316] Furthermore, the server integrates with project management tools to monitor emotional and communication risks that could impact project progress. Based on this information, it proposes improvements to the project. For example, if a team member is identified as being under high stress, suggestions may be made to re-evaluate the project member's role or adjust their workload.

[0317] Based on the information displayed on the terminal, users can make appropriate decisions tailored to their individual circumstances and take necessary measures. In this way, the main objective of the present invention is to support improvements not only to the physical but also to the psychological environment of the workplace, thereby improving employee health and corporate efficiency.

[0318] The following describes the processing flow.

[0319] Step 1:

[0320] The server acquires audio data in real time from voice collection devices installed within the workplace. This data, including meetings and everyday conversations, is securely stored for recording purposes.

[0321] Step 2:

[0322] The server converts the acquired audio data into text using a speech recognition engine. Here, it prioritizes transcribing the clearest parts of the audio and performs filtering to minimize transcription errors.

[0323] Step 3:

[0324] The server analyzes the transcribed data using natural language processing algorithms to detect signs of labor law violations and harassment. It also uses an emotion engine to identify the emotions contained in the statements and generate an emotion profile.

[0325] Step 4:

[0326] The server generates alerts based on detected issues and emotional states. These alerts include the type of issue identified, the emotional tendency, and related contextual information.

[0327] Step 5:

[0328] The server notifies relevant parties or administrators of the generated alerts in real time. The notifications are delivered via push notifications, prompting immediate action based on the level of importance.

[0329] Step 6:

[0330] The server integrates with project management tools to monitor emotional and communication risks in project progress and resource allocation. This includes regular data updates and real-time insights.

[0331] Step 7:

[0332] Based on the analyzed sentiment and risk data, the server proposes improvements and adjustments. These proposals may include recommendations for reviewing team composition or adjusting schedules.

[0333] Step 8:

[0334] Users review received alerts and suggestions and determine the best course of action based on them. Users can provide appropriate feedback and mental support in accordance with organizational policies.

[0335] (Example 2)

[0336] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0337] In today's workplace, it is essential to detect labor law violations and signs of harassment early and respond appropriately. However, these problems often do not manifest themselves overtly, and in many cases, they have underlying, complex emotional roots. Traditional methods have made it difficult to accurately identify these psychological risks and respond quickly. In addition, there is insufficient understanding and countermeasures for communication risks that affect project progress, which can ultimately reduce workplace efficiency.

[0338] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0339] In this invention, the server includes means for collecting acoustic information in real time and converting it into text data; means for analyzing the text data and detecting signs of regulatory violations or misconduct; means for analyzing an individual's emotional state and identifying psychological risks; and means for monitoring communication and progress risks in conjunction with a work management tool. This makes it possible to analyze potential workplace problems from multiple perspectives and propose appropriate improvement measures.

[0340] "Acoustic information" refers to data related to speech and other sounds that is collected and processed by a system.

[0341] "Text data" refers to data in text format converted from acoustic information, and is the information that will be analyzed.

[0342] "Analysis" refers to the process of data processing carried out to detect signs of regulatory violations or misconduct based on collected data.

[0343] "Indicators of regulatory violations or misconduct" refers to indications of actions or statements that may violate labor laws or other norms.

[0344] "Warning information" refers to alerts generated to notify about detected problems.

[0345] A "task management tool" refers to a software system used to manage projects and their progress.

[0346] "Communication and progress risks" refer to communication problems and risks that may affect the progress of the project.

[0347] "Emotional state" refers to an individual's psychological or emotional state, which is analyzed from their words and actions.

[0348] "Psychological risk" refers to problems in the workplace that may affect an individual's mental health.

[0349] This invention is a system that effectively collects and analyzes acoustic information in the workplace environment, enabling early detection of signs of regulatory violations and misconduct, as well as the evaluation of individuals' emotional states and the identification of psychological risks. At the core of this system is the comprehensive data processing capability achieved through the cooperation of servers and terminals.

[0350] The server first collects real-time acoustic information from the workplace through acoustic sensors and microphones. This function is achieved using microphone devices connected via the network and dedicated speech recognition devices. Specifically, speech recognition software such as Google Speech-to-Text API and IBM Watson Speech to Text is used to instantly convert the collected acoustic information into text data. This text data is stored on the server as the basis for analysis.

[0351] Next, the server runs a program to perform natural language analysis based on the stored text data. This analysis uses advanced natural language processing techniques such as Python's NLTK library and spaCy. During the analysis, the server detects signs of labor law violations and harassment and generates warning information as needed. The analysis also incorporates an emotion engine that calculates an individual's emotional state from the context of the conversation and word choice, and identifies psychological risks.

[0352] Furthermore, the server monitors project progress in conjunction with work management tools. Specifically, it uses APIs to connect data with project management software such as JIRA and Trello to detect and analyze communication and progress risks. Based on this data, the system proposes appropriate improvement measures to administrators. For example, if a particular team member is detected to be under high stress, it can suggest redistributing tasks.

[0353] The user receives information provided by the server through the terminal and takes appropriate action based on it. The terminal is equipped with a user interface for visualizing data analysis results and suggested improvement measures, allowing the user to quickly implement countermeasures.

[0354] For example, the effectiveness of the system can be further enhanced by utilizing a generative AI model with prompts such as, "How can I detect emotions and problems from workplace audio data?" In this way, the system aims to support improvements in the physical and psychological environment of the workplace, thereby improving employee health and corporate efficiency.

[0355] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0356] Step 1:

[0357] The server collects acoustic information in real time from acoustic sensors and microphones. The input is audio data from conference rooms or offices, and the output is stored in a database as acoustic information. In this process, noise cancellation technology is used to remove ambient noise and capture conversations more clearly.

[0358] Step 2:

[0359] The server converts the collected acoustic information into text data using speech recognition software. The input is acoustic information, and the output is text data. Specifically, when converting speech to text using the Google Speech-to-Text API, the server utilizes speaker identification to classify each utterance.

[0360] Step 3:

[0361] The server applies natural language processing to the converted text data to detect signs of regulatory violations or misconduct. The input is text data, and the output is the analysis result regarding the presence or absence of problems. It also utilizes an emotion engine to analyze the emotional state contained in each statement, identifying emotions such as joy, anger, and sadness. The data processing performed at this stage involves analyzing the text sentence by sentence and measuring the frequency of keywords and phrases.

[0362] Step 4:

[0363] The server generates warning information based on the analysis results and notifies the terminals of relevant administrators and personnel. The input is the analysis results, and the output is the alert notification. Here, notifications are sent via email or a dedicated app using a notification system, and delivered to relevant parties in different ways depending on the urgency.

[0364] Step 5:

[0365] The server works in conjunction with the work management tool to continuously monitor the risks in the project's progress and propose improvement measures. The input is the latest progress information obtained from the project management tool, and the output is a risk assessment and improvement suggestions. Specifically, it periodically synchronizes the project status using APIs such as JIRA, visualizes the data, and provides it as information that users can easily understand and use.

[0366] Step 6:

[0367] The user takes appropriate action based on the warning information and improvement suggestions displayed on the device. The input is notification information received from the server, and the output is the user's implementation of the corrective action. In this process, the user interface receives data feedback, presents recommended actions, and supports specific action procedures tailored to the situation.

[0368] (Application Example 2)

[0369] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0370] Traditional workplace management systems can detect labor law violations and signs of harassment, but they struggle to recognize employees' emotional states in real time and utilize that information to provide comprehensive environmental improvement measures. Furthermore, they lacked the ability to comprehensively assess and respond quickly to communication risks and emotional aspects related to project progress, resulting in insufficient improvement of the psychological workplace environment.

[0371] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0372] In this invention, the server includes means for collecting voice information in real time and converting it into text data, means for analyzing the text data and identifying signs of legal violations or harassment, and means for recognizing the user's emotional state using an emotion analysis engine. This makes it possible to comprehensively monitor psychological and legal risks in the workplace and provide appropriate countermeasures.

[0373] "Auditory information" refers to information conveyed through sound, and is typically recorded as data that includes human speech.

[0374] "Text data" refers to digital data that represents audio information as text, conveying information in a format that is easy for people to understand.

[0375] "Analysis" is the process of examining data in detail, identifying features and patterns, and deriving specific results.

[0376] "Violation of laws and regulations" refers to actions that violate the provisions of laws or regulations, and such actions may result in legal sanctions.

[0377] "Signs of harassment" refer to early signs that bullying or harassment may be occurring, and situations that should be addressed based on these signs.

[0378] An "emotion analysis engine" is a software process that identifies a user's emotions from text or audio, and has the function of automatically classifying their emotional state.

[0379] "Users" refers to individuals or groups who use the system or service, and in this context specifically refers to employees of a workplace.

[0380] A "warning" is a notification issued to alert people to potential dangers or inappropriate behavior, and is intended to encourage a prompt response.

[0381] "Notification" refers to the act of sending messages or alerts used to inform users of specific information.

[0382] A "project management system" is a digital tool or platform for tracking and managing the planning, progress, and outcomes of a project.

[0383] "Communication risk" refers to a situation where misunderstandings or communication errors may occur during information exchange, potentially hindering the progress of the project.

[0384] "Risks in the project progress" refers to situations where the project may not proceed as planned, and the results may not be achieved as intended.

[0385] An "improvement proposal" is a suggestion that identifies a problem and then outlines specific methods or measures to solve it.

[0386] The system for implementing this invention collects voice information in the workplace in real time, recognizes the emotional state of employees using an emotion analysis engine, and analyzes signs of legal violations or harassment. Specifically, these functions are realized by a server-centered system.

[0387] The server collects audio information from within the workplace in real time via microphones and audio devices. This audio information is converted into text data using speech recognition technologies such as Google Cloud Speech-to-Text and IBM Watson Speech-to-Text.

[0388] Next, the server performs natural language processing on this text data. Using natural language processing tools such as AWS Comprehend and Google Cloud Natural Language, it identifies signs of legal violations and harassment within the text and analyzes the user's emotional state. This includes automatically categorizing emotions such as joy, anger, and sadness.

[0389] When notifying users of necessary information on their devices, the system also notifies administrators of specific problems or changes in emotional state on their devices. This allows administrators to monitor employees' psychological health and take appropriate action.

[0390] As a concrete example, consider a scenario where, during a weekly meeting, an emotional analysis of project team members reveals that a particular member is experiencing extremely high stress levels. The system then alerts the administrator, who can quickly address the issue by reviewing their workload.

[0391] An example of a prompt using a generative AI model is, "Analyze the emotional state of the participants during the meeting and report the stress index." Using this prompt helps to facilitate the smooth progress of the emotion recognition process.

[0392] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0393] Step 1:

[0394] The server collects audio information in real time via audio devices within the workplace. The input for this step is the audio signal acquired from the microphone, and the output is digitized audio data. This data is stored on the server for further processing.

[0395] Step 2:

[0396] The server converts the collected audio data into text data using speech recognition technology such as Google Cloud Speech-to-Text. The input for this step is digitized audio data, and the output is text data. The speech recognition algorithm records the content of the audio as text information.

[0397] Step 3:

[0398] The server analyzes the text data using natural language processing tools such as AWS Comprehend. The input for this step is data expressed in text format, and the output is the analyzed information. Specifically, the analysis is performed to identify indicators of emotional state and signs of legal violations.

[0399] Step 4:

[0400] The server generates and sends an alert to the administrator's terminal based on the analyzed information. The input for this step is the information obtained as a result of the analysis, and the output is the alert message sent to the administrator's terminal. This allows the administrator to immediately grasp any abnormalities in the work environment.

[0401] Step 5:

[0402] The user reviews the alerts and analysis information received via the terminal and takes necessary actions. The input for this step is the notification information displayed on the terminal, and the output is the user's response and feedback. Based on the suggested improvements, the user reviews their work processes and adjusts their work environment.

[0403] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0404] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0405] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0406] [Third Embodiment]

[0407] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0408] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0409] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0410] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0411] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0412] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0413] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0414] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0415] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0416] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0417] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0418] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0419] This invention provides a system for monitoring workplace communication and preventing labor law violations and harassment. This system is implemented in the following manner:

[0420] First, the server acquires audio data from devices that collect workplace conversations. This data is collected in real time through microphones installed in the office and video conferencing systems. The server then uses a speech recognition engine to transcribe this audio data into text. This transcription process provides the foundational data needed to analyze potentially problematic statements.

[0421] Next, the server analyzes the transcribed data using natural language processing capabilities. During this analysis, algorithms are used to detect keywords that may violate labor laws and expressions that could constitute harassment. This ensures that any problematic statements are immediately recognized.

[0422] Based on the detected information, the server generates an appropriate alert. This alert is sent to the terminals of the relevant administrators and personnel, providing real-time notification. This enables a rapid response.

[0423] Furthermore, the server communicates with project management tools used within the workplace to monitor communication and scheduling risks related to project progress. For detected risks, alerts and specific improvement suggestions are sent to the user's device. Based on these suggestions, users can readjust project member assignments and schedules, supporting efficient work operations.

[0424] As a concrete example, let's say a project team is having an ongoing meeting. The server transcribes the comments made during the meeting into text, and if the analysis detects any "unreasonable demands" or "harassing language," it immediately notifies the team leader's terminal with a message saying "attention required." Through this process, it is possible to minimize the impact of comments on work and improve the workplace environment.

[0425] The following describes the processing flow.

[0426] Step 1:

[0427] The server acquires audio data in real time from audio collection devices within the office. This includes microphones in conference rooms and digital conversation recording systems.

[0428] Step 2:

[0429] The server converts the acquired audio data into text data using a speech recognition engine. In this process, the audio signal is analyzed, and highly accurate text conversion is performed.

[0430] Step 3:

[0431] The server analyzes the transcribed data using natural language processing algorithms. Here, it identifies specific keywords and phrases and detects contexts that may be related to labor law violations or harassment.

[0432] Step 4:

[0433] If the server detects a problem based on the analysis results, it will immediately generate an alert message. This alert will include specific information about what the problem is and which statements require attention.

[0434] Step 5:

[0435] The server sends the generated alerts to the administrator's or designated person's terminal. Notifications are provided in real time via push notifications or a dedicated dashboard.

[0436] Step 6:

[0437] The server simultaneously accesses project management tools to check the schedule and resource status of ongoing projects. If necessary, it identifies relevant risk factors.

[0438] Step 7:

[0439] The server generates project improvement suggestions based on the detected risks. These suggestions may include adding new members, revising processes, and adjusting the schedule.

[0440] Step 8:

[0441] Users can review alerts and suggestions received on their devices and select and implement appropriate countermeasures. This enables early problem resolution and efficient project management.

[0442] (Example 1)

[0443] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0444] In workplace communication, labor law violations and harassment are extremely important issues, and early detection and countermeasures are required. However, traditional monitoring methods have the challenge of not being able to detect problems in real time. Furthermore, there are difficulties in appropriately monitoring and immediately responding to communication and scheduling risks that arise as projects progress.

[0445] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0446] In this invention, the server includes means for acquiring and converting voice information into text in real time; means for analyzing the converted information and detecting signs of legal violations or disruptive behavior; means for creating and notifying warnings based on the detected content; means for monitoring communication and planning risks in cooperation with a business management system; and means for suggesting improvements based on the risks. This makes it possible to detect signs of potential legal violations and harassment occurring in the workplace in real time and notify managers with appropriate alerts. Furthermore, by coordinating with project management tools, it becomes possible to immediately discover communication and scheduling risks and support efficient business operations.

[0447] "Audio information" refers to data, including conversations and sounds, collected in the workplace.

[0448] "Real-time acquisition" refers to the process of collecting audio information almost simultaneously with its generation.

[0449] "String conversion" is the process of converting audio information into text data.

[0450] "Analysis" is the process of understanding meaning and intent based on information that has been converted into a string.

[0451] "Signs of legal violation" refers to any conduct that may violate labor laws or related regulations.

[0452] "Disruptive behavior" refers to problematic actions in the workplace, including harassment and inappropriate remarks.

[0453] A "warning" is a notification generated to address detected anomalies or problems.

[0454] "Notification" is the act of conveying warnings or information to relevant parties.

[0455] A "business management system" is a digital tool used to manage the progress of a project and related tasks.

[0456] "Communication risks" refer to the risks inherent in business communication.

[0457] "Planning risks" refer to risks related to the project's schedule and progress.

[0458] A "suggestion for improvement" is a specific solution to the identified risks or problems.

[0459] This invention provides a system for efficiently monitoring workplace communication, preventing labor law violations and disruptive behavior, and effectively managing projects.

[0460] The server works in conjunction with multiple audio collection devices to collect audio information in real time, both within the office and through digital conferencing systems. Specific examples of such devices include network-enabled microphones and video conferencing systems. The audio information is transmitted to the server and converted into text using audio conversion technology. For example, speech recognition technologies such as the Google Cloud Speech-to-Text API can be utilized.

[0461] Next, the server applies natural language processing algorithms to the stringified information to detect potential legal violations or disruptive behavior. This utilizes generative AI models to extract specific keywords and phrases. If a problem is detected as a result of this analysis, the server immediately generates a warning and notifies the relevant parties' terminals.

[0462] Furthermore, the server connects to the business management systems used by the company (e.g., Asana or Jira) to monitor project communications and planning risks. This allows users to detect potential problems in project progress early and respond quickly based on improvement suggestions provided along with alerts.

[0463] As a concrete example, during a meeting for an ongoing project, the server transcribes the user's statements in real time and sends a "warning" to the team leader's terminal if it contains "inappropriate requests" or "inappropriate expressions." This allows the user to immediately review their statements and prevent workplace problems.

[0464] An example of a prompt message might be, "Explain how to retrieve workplace conversation data, analyze inappropriate remarks, and generate alerts."

[0465] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0466] Step 1:

[0467] The server acquires audio information in real time from various audio acquisition devices within the workplace. This input includes network-enabled microphones and video conferencing systems. The server receives the audio information from these devices as streaming data and temporarily stores it in a compressed format. During this process, the audio signals are properly captured and prepared for processing.

[0468] Step 2:

[0469] The server converts the acquired audio information into text using acoustic conversion technology. This conversion employs a model based on speech recognition technology. The input is an audio signal, and the output as a result of the processing is text data. For this text conversion process, the server utilizes the Google Cloud Speech-to-Text API and other tools to convert audio to text in real time and store it in a database.

[0470] Step 3:

[0471] The server analyzes the stringified text data to detect keywords and phrases that may indicate illegal or disruptive behavior. The input is the text data obtained in step 2, and the output is the analysis results indicating the problem. The server performs this analysis based on natural language processing algorithms and generative AI models, and compares it against a keyword list. By identifying specific patterns, it detects abnormal statements.

[0472] Step 4:

[0473] The server generates warnings based on detected anomalies and problems and notifies the relevant parties' terminals. The input is the analysis results from step 3, and the output is the warning message. The server responds immediately to problematic statements and sends warnings to relevant parties via email or messaging applications. In this process, it can also provide specific action guidelines to support rapid decision-making.

[0474] Step 5:

[0475] The server works in conjunction with the project management system to monitor project communication and planning risks. Its inputs include up-to-date data from the project management tool, and its outputs are risk assessments and improvement suggestions. The server retrieves information from the project management system via an API and analyzes it to identify potential risks. Based on these risks, specific improvement suggestions are generated and provided to the user.

[0476] (Application Example 1)

[0477] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0478] In offices and workplaces, unintentional statements that violate laws or ethics can occur during communication. Such statements can worsen the work environment and damage the organization's credibility, thus requiring real-time problem detection and rapid response. Furthermore, accurately preserving communication history is crucial to prevent future disputes and to serve as evidence if problems arise. Effective technology is needed to address these challenges.

[0479] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0480] In this invention, the server includes means for acquiring voice information in real time and converting it into text information, means for analyzing the converted text information and detecting signs of human rights or ethical issues, and means for generating and notifying warnings based on the detected information. This makes it possible to quickly detect legal or ethical issues that may occur in workplace communication and immediately notify administrators.

[0481] "Audio information" refers to all data acquired through audio, including linguistic content collected in real time.

[0482] "Textual information" refers to data that represents audio information as text, and includes strings of language extracted using speech recognition technology.

[0483] "Analysis" refers to the process of analyzing acquired textual information and identifying problems in light of specific legal or ethical standards.

[0484] A "warning" refers to an alert or notification generated based on a detected problem, intended to quickly inform administrators and relevant individuals.

[0485] A "project management tool" refers to an entire system used to monitor the progress of a project or task, and to manage related information and schedules.

[0486] "Risk" refers to the risk arising from statements or actions that violate laws or ethics, and means the potential for the working environment or the health of the organization to be compromised as a result.

[0487] "Terminal" refers to a device used by a user to receive or manipulate information, and includes personal computers and smartphones.

[0488] The system implementing this invention aims to utilize voice information in an office environment to perform real-time monitoring and analysis in order to maintain healthy communication within the workplace.

[0489] The server first acquires audio information through speech recognition devices installed in the office or workplace. This audio information is then converted into text using Google Cloud Speech-to-Text or similar speech recognition technology. This digitizes the content of the conversation, making it available for subsequent analysis.

[0490] Next, this textual information is analyzed using a natural language processing engine (such as SpaCy or NLTK) on the server. Here, specific keywords and phrases are detected against pre-defined legal and ethical standards, and potential problems are identified. These processes make it possible to evaluate whether the content of workplace communication is appropriate.

[0491] When a problem is detected, the server immediately generates an alert and notifies the device. This notification is sent to the devices of administrators and relevant personnel using Firebase Cloud Messaging or similar push notification technology. This allows administrators to respond to workplace issues in real time.

[0492] Furthermore, the terminal can receive suggestions from the server and, if necessary, present measures to improve communication. This allows users to improve their work environment and support effective business operations.

[0493] For example, if an inappropriate request is made during a meeting, the server detects it and sends a warning to the administrator's terminal stating that "the content of the statement may be inappropriate." At this time, the user is presented with a prompt such as "Please suggest a better way to communicate to resolve this issue," and specific improvement suggestions are provided by a generative AI model.

[0494] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0495] Step 1:

[0496] The server acquires audio information from speech recognition devices installed within the workplace. This input audio information covers the entirety of conversations within the workplace. Based on this, speech recognition technology is used to process it in real time and convert it into text information. The converted text information becomes the input for the next step.

[0497] Step 2:

[0498] The server receives the character information converted in Step 1 as input and performs analysis using a natural language processing engine. Here, the data is processed to detect keywords and phrases that correspond to pre-defined laws and ethical standards. The output is a result that determines whether a problem has been detected. This analysis result indicates the potential presence or absence of a problem.

[0499] Step 3:

[0500] Based on the analysis results from Step 2, the server generates an alert if a problem is detected. The specific nature and severity of the problem are taken into consideration when generating this alert. The generated alert is sent to the device using a push notification service. The device receives the alert and notifies the administrator in real time. The alert message is then displayed on the administrator's device as output.

[0501] Step 4:

[0502] The terminal receives a notification sent from the server and informs the user. After the user acknowledges this warning, they wait for instructions regarding the improvement suggestions provided by the server. These improvement suggestions include specific actions the user should take to improve the work environment. The system uses a generative AI model to provide specific improvement suggestions based on the prompt. This output includes improvement actions that the user can actually implement.

[0503] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0504] This invention is a system that combines an emotion engine that not only detects potential problems in workplace communication but also recognizes the user's emotional state. In addition to conventional problem detection functions, this system aims to identify psychological risks in the workplace and provide comprehensive improvement measures.

[0505] First, the server collects voice and text data from within the workplace in real time. This includes conversations in meeting rooms and offices, which are then converted into text data using speech recognition technology. This transcribed data is then subjected to subsequent analysis processing.

[0506] Next, the server applies natural language processing to the text data to detect signs of labor law violations and harassment. In addition, it utilizes an emotion engine to analyze the emotional state associated with each user's statements. The emotion engine identifies emotions such as joy, anger, and sadness from word choices and context, and adds this information to the analysis results.

[0507] If the analysis detects a problem, the server generates an alert and notifies the relevant administrators and personnel on their terminals. The alert includes details of the identified problem and the detected emotional tendencies. This allows for a multifaceted assessment of the situation and enables a quick and appropriate response.

[0508] Furthermore, the server integrates with project management tools to monitor emotional and communication risks that could impact project progress. Based on this information, it proposes improvements to the project. For example, if a team member is identified as being under high stress, suggestions may be made to re-evaluate the project member's role or adjust their workload.

[0509] Based on the information displayed on the terminal, users can make appropriate decisions tailored to their individual circumstances and take necessary measures. In this way, the main objective of the present invention is to support improvements not only to the physical but also to the psychological environment of the workplace, thereby improving employee health and corporate efficiency.

[0510] The following describes the processing flow.

[0511] Step 1:

[0512] The server acquires audio data in real time from voice collection devices installed within the workplace. This data, including meetings and everyday conversations, is securely stored for recording purposes.

[0513] Step 2:

[0514] The server converts the acquired audio data into text using a speech recognition engine. Here, it prioritizes transcribing the clearest parts of the audio and performs filtering to minimize transcription errors.

[0515] Step 3:

[0516] The server analyzes the transcribed data using natural language processing algorithms to detect signs of labor law violations and harassment. It also uses an emotion engine to identify the emotions contained in the statements and generate an emotion profile.

[0517] Step 4:

[0518] The server generates alerts based on detected issues and emotional states. These alerts include the type of issue identified, the emotional tendency, and related contextual information.

[0519] Step 5:

[0520] The server notifies relevant parties or administrators of the generated alerts in real time. The notifications are delivered via push notifications, prompting immediate action based on the level of importance.

[0521] Step 6:

[0522] The server integrates with project management tools to monitor emotional and communication risks in project progress and resource allocation. This includes regular data updates and real-time insights.

[0523] Step 7:

[0524] Based on the analyzed sentiment and risk data, the server proposes improvements and adjustments. These proposals may include recommendations for reviewing team composition or adjusting schedules.

[0525] Step 8:

[0526] Users review received alerts and suggestions and determine the best course of action based on them. Users can provide appropriate feedback and mental support in accordance with organizational policies.

[0527] (Example 2)

[0528] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0529] In today's workplace, it is essential to detect labor law violations and signs of harassment early and respond appropriately. However, these problems often do not manifest themselves overtly, and in many cases, they have underlying, complex emotional roots. Traditional methods have made it difficult to accurately identify these psychological risks and respond quickly. In addition, there is insufficient understanding and countermeasures for communication risks that affect project progress, which can ultimately reduce workplace efficiency.

[0530] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0531] In this invention, the server includes means for collecting acoustic information in real time and converting it into text data; means for analyzing the text data and detecting signs of regulatory violations or misconduct; means for analyzing an individual's emotional state and identifying psychological risks; and means for monitoring communication and progress risks in conjunction with a work management tool. This makes it possible to analyze potential workplace problems from multiple perspectives and propose appropriate improvement measures.

[0532] "Acoustic information" refers to data related to speech and other sounds that is collected and processed by a system.

[0533] "Text data" refers to data in text format converted from acoustic information, and is the information that will be analyzed.

[0534] "Analysis" refers to the process of data processing carried out to detect signs of regulatory violations or misconduct based on collected data.

[0535] "Indicators of regulatory violations or misconduct" refers to indications of actions or statements that may violate labor laws or other norms.

[0536] "Warning information" refers to alerts generated to notify about detected problems.

[0537] A "task management tool" refers to a software system used to manage projects and their progress.

[0538] "Communication and progress risks" refer to communication problems and risks that may affect the progress of the project.

[0539] "Emotional state" refers to an individual's psychological or emotional state, which is analyzed from their words and actions.

[0540] "Psychological risk" refers to problems in the workplace that may affect an individual's mental health.

[0541] This invention is a system that effectively collects and analyzes acoustic information in the workplace environment, enabling early detection of signs of regulatory violations and misconduct, as well as the evaluation of individuals' emotional states and the identification of psychological risks. At the core of this system is the comprehensive data processing capability achieved through the cooperation of servers and terminals.

[0542] The server first collects real-time acoustic information from the workplace through acoustic sensors and microphones. This function is achieved using microphone devices connected via the network and dedicated speech recognition devices. Specifically, speech recognition software such as Google Speech-to-Text API and IBM Watson Speech to Text is used to instantly convert the collected acoustic information into text data. This text data is stored on the server as the basis for analysis.

[0543] Next, the server runs a program to perform natural language analysis based on the stored text data. This analysis uses advanced natural language processing techniques such as Python's NLTK library and spaCy. During the analysis, the server detects signs of labor law violations and harassment and generates warning information as needed. The analysis also incorporates an emotion engine that calculates an individual's emotional state from the context of the conversation and word choice, and identifies psychological risks.

[0544] Furthermore, the server monitors project progress in conjunction with work management tools. Specifically, it uses APIs to connect data with project management software such as JIRA and Trello to detect and analyze communication and progress risks. Based on this data, the system proposes appropriate improvement measures to administrators. For example, if a particular team member is detected to be under high stress, it can suggest redistributing tasks.

[0545] The user receives information provided by the server through the terminal and takes appropriate action based on it. The terminal is equipped with a user interface for visualizing data analysis results and suggested improvement measures, allowing the user to quickly implement countermeasures.

[0546] For example, the effectiveness of the system can be further enhanced by utilizing a generative AI model with prompts such as, "How can I detect emotions and problems from workplace audio data?" In this way, the system aims to support improvements in the physical and psychological environment of the workplace, thereby improving employee health and corporate efficiency.

[0547] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0548] Step 1:

[0549] The server collects acoustic information in real time from acoustic sensors and microphones. The input is audio data from conference rooms or offices, and the output is stored in a database as acoustic information. In this process, noise cancellation technology is used to remove ambient noise and capture conversations more clearly.

[0550] Step 2:

[0551] The server converts the collected acoustic information into text data using speech recognition software. The input is acoustic information, and the output is text data. Specifically, when converting speech to text using the Google Speech-to-Text API, the server utilizes speaker identification to classify each utterance.

[0552] Step 3:

[0553] The server applies natural language processing to the converted text data to detect signs of regulatory violations or misconduct. The input is text data, and the output is the analysis result regarding the presence or absence of problems. It also utilizes an emotion engine to analyze the emotional state contained in each statement, identifying emotions such as joy, anger, and sadness. The data processing performed at this stage involves analyzing the text sentence by sentence and measuring the frequency of keywords and phrases.

[0554] Step 4:

[0555] The server generates warning information based on the analysis results and notifies the terminals of relevant administrators and personnel. The input is the analysis results, and the output is the alert notification. Here, notifications are sent via email or a dedicated app using a notification system, and delivered to relevant parties in different ways depending on the urgency.

[0556] Step 5:

[0557] The server works in conjunction with the work management tool to continuously monitor the risks in the project's progress and propose improvement measures. The input is the latest progress information obtained from the project management tool, and the output is a risk assessment and improvement suggestions. Specifically, it periodically synchronizes the project status using APIs such as JIRA, visualizes the data, and provides it as information that users can easily understand and use.

[0558] Step 6:

[0559] The user takes appropriate action based on the warning information and improvement suggestions displayed on the device. The input is notification information received from the server, and the output is the user's implementation of the corrective action. In this process, the user interface receives data feedback, presents recommended actions, and supports specific action procedures tailored to the situation.

[0560] (Application Example 2)

[0561] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0562] Traditional workplace management systems can detect labor law violations and signs of harassment, but they struggle to recognize employees' emotional states in real time and utilize that information to provide comprehensive environmental improvement measures. Furthermore, they lacked the ability to comprehensively assess and respond quickly to communication risks and emotional aspects related to project progress, resulting in insufficient improvement of the psychological workplace environment.

[0563] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0564] In this invention, the server includes means for collecting voice information in real time and converting it into text data, means for analyzing the text data and identifying signs of legal violations or harassment, and means for recognizing the user's emotional state using an emotion analysis engine. This makes it possible to comprehensively monitor psychological and legal risks in the workplace and provide appropriate countermeasures.

[0565] "Auditory information" refers to information conveyed through sound, and is typically recorded as data that includes human speech.

[0566] "Text data" refers to digital data that represents audio information as text, conveying information in a format that is easy for people to understand.

[0567] "Analysis" is the process of examining data in detail, identifying features and patterns, and deriving specific results.

[0568] "Violation of laws and regulations" refers to actions that violate the provisions of laws or regulations, and such actions may result in legal sanctions.

[0569] "Signs of harassment" refer to early signs that bullying or harassment may be occurring, and situations that should be addressed based on these signs.

[0570] An "emotion analysis engine" is a software process that identifies a user's emotions from text or audio, and has the function of automatically classifying their emotional state.

[0571] "Users" refers to individuals or groups who use the system or service, and in this context specifically refers to employees of a workplace.

[0572] A "warning" is a notification issued to alert people to potential dangers or inappropriate behavior, and is intended to encourage a prompt response.

[0573] "Notification" refers to the act of sending messages or alerts used to inform users of specific information.

[0574] A "project management system" is a digital tool or platform for tracking and managing the planning, progress, and outcomes of a project.

[0575] "Communication risk" refers to a situation where misunderstandings or communication errors may occur during information exchange, potentially hindering the progress of the project.

[0576] "Risks in the project progress" refers to situations where the project may not proceed as planned, and the results may not be achieved as intended.

[0577] An "improvement proposal" is a suggestion that identifies a problem and then outlines specific methods or measures to solve it.

[0578] The system for implementing this invention collects voice information in the workplace in real time, recognizes the emotional state of employees using an emotion analysis engine, and analyzes signs of legal violations or harassment. Specifically, these functions are realized by a server-centered system.

[0579] The server collects audio information from within the workplace in real time via microphones and audio devices. This audio information is converted into text data using speech recognition technologies such as Google Cloud Speech-to-Text and IBM Watson Speech-to-Text.

[0580] Next, the server performs natural language processing on this text data. Using natural language processing tools such as AWS Comprehend and Google Cloud Natural Language, it identifies signs of legal violations and harassment within the text and analyzes the user's emotional state. This includes automatically categorizing emotions such as joy, anger, and sadness.

[0581] When notifying users of necessary information on their devices, the system also notifies administrators of specific problems or changes in emotional state on their devices. This allows administrators to monitor employees' psychological health and take appropriate action.

[0582] As a concrete example, consider a scenario where, during a weekly meeting, an emotional analysis of project team members reveals that a particular member is experiencing extremely high stress levels. The system then alerts the administrator, who can quickly address the issue by reviewing their workload.

[0583] An example of a prompt using a generative AI model is, "Analyze the emotional state of the participants during the meeting and report the stress index." Using this prompt helps to facilitate the smooth progress of the emotion recognition process.

[0584] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0585] Step 1:

[0586] The server collects audio information in real time via audio devices within the workplace. The input for this step is the audio signal acquired from the microphone, and the output is digitized audio data. This data is stored on the server for further processing.

[0587] Step 2:

[0588] The server converts the collected audio data into text data using speech recognition technology such as Google Cloud Speech-to-Text. The input for this step is digitized audio data, and the output is text data. The speech recognition algorithm records the content of the audio as text information.

[0589] Step 3:

[0590] The server analyzes the text data using natural language processing tools such as AWS Comprehend. The input for this step is data expressed in text format, and the output is the analyzed information. Specifically, the analysis is performed to identify indicators of emotional state and signs of legal violations.

[0591] Step 4:

[0592] The server generates and sends an alert to the administrator's terminal based on the analyzed information. The input for this step is the information obtained as a result of the analysis, and the output is the alert message sent to the administrator's terminal. This allows the administrator to immediately grasp any abnormalities in the work environment.

[0593] Step 5:

[0594] The user reviews the alerts and analysis information received via the terminal and takes necessary actions. The input for this step is the notification information displayed on the terminal, and the output is the user's response and feedback. Based on the suggested improvements, the user reviews their work processes and adjusts their work environment.

[0595] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0596] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0597] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0598] [Fourth Embodiment]

[0599] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0600] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0601] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0602] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0603] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0604] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0605] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0606] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0607] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0608] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0609] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0610] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0611] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0612] This invention provides a system for monitoring workplace communication and preventing labor law violations and harassment. This system is implemented in the following manner:

[0613] First, the server acquires audio data from devices that collect workplace conversations. This data is collected in real time through microphones installed in the office and video conferencing systems. The server then uses a speech recognition engine to transcribe this audio data into text. This transcription process provides the foundational data needed to analyze potentially problematic statements.

[0614] Next, the server analyzes the transcribed data using natural language processing capabilities. During this analysis, algorithms are used to detect keywords that may violate labor laws and expressions that could constitute harassment. This ensures that any problematic statements are immediately recognized.

[0615] Based on the detected information, the server generates an appropriate alert. This alert is sent to the terminals of the relevant administrators and personnel, providing real-time notification. This enables a rapid response.

[0616] Furthermore, the server communicates with project management tools used within the workplace to monitor communication and scheduling risks related to project progress. For detected risks, alerts and specific improvement suggestions are sent to the user's device. Based on these suggestions, users can readjust project member assignments and schedules, supporting efficient work operations.

[0617] As a concrete example, let's say a project team is having an ongoing meeting. The server transcribes the comments made during the meeting into text, and if the analysis detects any "unreasonable demands" or "harassing language," it immediately notifies the team leader's terminal with a message saying "attention required." Through this process, it is possible to minimize the impact of comments on work and improve the workplace environment.

[0618] The following describes the processing flow.

[0619] Step 1:

[0620] The server acquires audio data in real time from audio collection devices within the office. This includes microphones in conference rooms and digital conversation recording systems.

[0621] Step 2:

[0622] The server converts the acquired audio data into text data using a speech recognition engine. In this process, the audio signal is analyzed, and highly accurate text conversion is performed.

[0623] Step 3:

[0624] The server analyzes the transcribed data using natural language processing algorithms. Here, it identifies specific keywords and phrases and detects contexts that may be related to labor law violations or harassment.

[0625] Step 4:

[0626] If the server detects a problem based on the analysis results, it will immediately generate an alert message. This alert will include specific information about what the problem is and which statements require attention.

[0627] Step 5:

[0628] The server sends the generated alerts to the administrator's or designated person's terminal. Notifications are provided in real time via push notifications or a dedicated dashboard.

[0629] Step 6:

[0630] The server simultaneously accesses project management tools to check the schedule and resource status of ongoing projects. If necessary, it identifies relevant risk factors.

[0631] Step 7:

[0632] The server generates project improvement suggestions based on the detected risks. These suggestions may include adding new members, revising processes, and adjusting the schedule.

[0633] Step 8:

[0634] Users can review alerts and suggestions received on their devices and select and implement appropriate countermeasures. This enables early problem resolution and efficient project management.

[0635] (Example 1)

[0636] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0637] In workplace communication, labor law violations and harassment are extremely important issues, and early detection and countermeasures are required. However, traditional monitoring methods have the challenge of not being able to detect problems in real time. Furthermore, there are difficulties in appropriately monitoring and immediately responding to communication and scheduling risks that arise as projects progress.

[0638] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0639] In this invention, the server includes means for acquiring and converting voice information into text in real time; means for analyzing the converted information and detecting signs of legal violations or disruptive behavior; means for creating and notifying warnings based on the detected content; means for monitoring communication and planning risks in cooperation with a business management system; and means for suggesting improvements based on the risks. This makes it possible to detect signs of potential legal violations and harassment occurring in the workplace in real time and notify managers with appropriate alerts. Furthermore, by coordinating with project management tools, it becomes possible to immediately discover communication and scheduling risks and support efficient business operations.

[0640] "Audio information" refers to data, including conversations and sounds, collected in the workplace.

[0641] "Real-time acquisition" refers to the process of collecting audio information almost simultaneously with its generation.

[0642] "String conversion" is the process of converting audio information into text data.

[0643] "Analysis" is the process of understanding meaning and intent based on information that has been converted into a string.

[0644] "Signs of legal violation" refers to any conduct that may violate labor laws or related regulations.

[0645] "Disruptive behavior" refers to problematic actions in the workplace, including harassment and inappropriate remarks.

[0646] A "warning" is a notification generated to address detected anomalies or problems.

[0647] "Notification" is the act of conveying warnings or information to relevant parties.

[0648] A "business management system" is a digital tool used to manage the progress of a project and related tasks.

[0649] "Communication risks" refer to the risks inherent in business communication.

[0650] "Planning risks" refer to risks related to the project's schedule and progress.

[0651] A "suggestion for improvement" is a specific solution to the identified risks or problems.

[0652] This invention provides a system for efficiently monitoring workplace communication, preventing labor law violations and disruptive behavior, and effectively managing projects.

[0653] The server works in conjunction with multiple audio collection devices to collect audio information in real time, both within the office and through digital conferencing systems. Specific examples of such devices include network-enabled microphones and video conferencing systems. The audio information is transmitted to the server and converted into text using audio conversion technology. For example, speech recognition technologies such as the Google Cloud Speech-to-Text API can be utilized.

[0654] Next, the server applies natural language processing algorithms to the stringified information to detect potential legal violations or disruptive behavior. This utilizes generative AI models to extract specific keywords and phrases. If a problem is detected as a result of this analysis, the server immediately generates a warning and notifies the relevant parties' terminals.

[0655] Furthermore, the server connects to the business management systems used by the company (e.g., Asana or Jira) to monitor project communications and planning risks. This allows users to detect potential problems in project progress early and respond quickly based on improvement suggestions provided along with alerts.

[0656] As a concrete example, during a meeting for an ongoing project, the server transcribes the user's statements in real time and sends a "warning" to the team leader's terminal if it contains "inappropriate requests" or "inappropriate expressions." This allows the user to immediately review their statements and prevent workplace problems.

[0657] An example of a prompt message might be, "Explain how to retrieve workplace conversation data, analyze inappropriate remarks, and generate alerts."

[0658] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0659] Step 1:

[0660] The server acquires audio information in real time from various audio acquisition devices within the workplace. This input includes network-enabled microphones and video conferencing systems. The server receives the audio information from these devices as streaming data and temporarily stores it in a compressed format. During this process, the audio signals are properly captured and prepared for processing.

[0661] Step 2:

[0662] The server converts the acquired audio information into text using acoustic conversion technology. This conversion employs a model based on speech recognition technology. The input is an audio signal, and the output as a result of the processing is text data. For this text conversion process, the server utilizes the Google Cloud Speech-to-Text API and other tools to convert audio to text in real time and store it in a database.

[0663] Step 3:

[0664] The server analyzes the stringified text data to detect keywords and phrases that may indicate illegal or disruptive behavior. The input is the text data obtained in step 2, and the output is the analysis results indicating the problem. The server performs this analysis based on natural language processing algorithms and generative AI models, and compares it against a keyword list. By identifying specific patterns, it detects abnormal statements.

[0665] Step 4:

[0666] The server generates warnings based on detected anomalies and problems and notifies the relevant parties' terminals. The input is the analysis results from step 3, and the output is the warning message. The server responds immediately to problematic statements and sends warnings to relevant parties via email or messaging applications. In this process, it can also provide specific action guidelines to support rapid decision-making.

[0667] Step 5:

[0668] The server works in conjunction with the project management system to monitor project communication and planning risks. Its inputs include up-to-date data from the project management tool, and its outputs are risk assessments and improvement suggestions. The server retrieves information from the project management system via an API and analyzes it to identify potential risks. Based on these risks, specific improvement suggestions are generated and provided to the user.

[0669] (Application Example 1)

[0670] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0671] In offices and workplaces, unintentional statements that violate laws or ethics can occur during communication. Such statements can worsen the work environment and damage the organization's credibility, thus requiring real-time problem detection and rapid response. Furthermore, accurately preserving communication history is crucial to prevent future disputes and to serve as evidence if problems arise. Effective technology is needed to address these challenges.

[0672] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0673] In this invention, the server includes means for acquiring voice information in real time and converting it into text information, means for analyzing the converted text information and detecting signs of human rights or ethical issues, and means for generating and notifying warnings based on the detected information. This makes it possible to quickly detect legal or ethical issues that may occur in workplace communication and immediately notify administrators.

[0674] "Audio information" refers to all data acquired through audio, including linguistic content collected in real time.

[0675] "Textual information" refers to data that represents audio information as text, and includes strings of language extracted using speech recognition technology.

[0676] "Analysis" refers to the process of analyzing acquired textual information and identifying problems in light of specific legal or ethical standards.

[0677] A "warning" refers to an alert or notification generated based on a detected problem, intended to quickly inform administrators and relevant individuals.

[0678] A "project management tool" refers to an entire system used to monitor the progress of a project or task, and to manage related information and schedules.

[0679] "Risk" refers to the risk arising from statements or actions that violate laws or ethics, and means the potential for the working environment or the health of the organization to be compromised as a result.

[0680] "Terminal" refers to a device used by a user to receive or manipulate information, and includes personal computers and smartphones.

[0681] The system implementing this invention aims to utilize voice information in an office environment to perform real-time monitoring and analysis in order to maintain healthy communication within the workplace.

[0682] The server first acquires audio information through speech recognition devices installed in the office or workplace. This audio information is then converted into text using Google Cloud Speech-to-Text or similar speech recognition technology. This digitizes the content of the conversation, making it available for subsequent analysis.

[0683] Next, this textual information is analyzed using a natural language processing engine (such as SpaCy or NLTK) on the server. Here, specific keywords and phrases are detected against pre-defined legal and ethical standards, and potential problems are identified. These processes make it possible to evaluate whether the content of workplace communication is appropriate.

[0684] When a problem is detected, the server immediately generates an alert and notifies the device. This notification is sent to the devices of administrators and relevant personnel using Firebase Cloud Messaging or similar push notification technology. This allows administrators to respond to workplace issues in real time.

[0685] Furthermore, the terminal can receive suggestions from the server and, if necessary, present measures to improve communication. This allows users to improve their work environment and support effective business operations.

[0686] For example, if an inappropriate request is made during a meeting, the server detects it and sends a warning to the administrator's terminal stating that "the content of the statement may be inappropriate." At this time, the user is presented with a prompt such as "Please suggest a better way to communicate to resolve this issue," and specific improvement suggestions are provided by a generative AI model.

[0687] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0688] Step 1:

[0689] The server acquires audio information from speech recognition devices installed within the workplace. This input audio information covers the entirety of conversations within the workplace. Based on this, speech recognition technology is used to process it in real time and convert it into text information. The converted text information becomes the input for the next step.

[0690] Step 2:

[0691] The server receives the character information converted in Step 1 as input and performs analysis using a natural language processing engine. Here, the data is processed to detect keywords and phrases that correspond to pre-defined laws and ethical standards. The output is a result that determines whether a problem has been detected. This analysis result indicates the potential presence or absence of a problem.

[0692] Step 3:

[0693] Based on the analysis results from Step 2, the server generates an alert if a problem is detected. The specific nature and severity of the problem are taken into consideration when generating this alert. The generated alert is sent to the device using a push notification service. The device receives the alert and notifies the administrator in real time. The alert message is then displayed on the administrator's device as output.

[0694] Step 4:

[0695] The terminal receives a notification sent from the server and informs the user. After the user acknowledges this warning, they wait for instructions regarding the improvement suggestions provided by the server. These improvement suggestions include specific actions the user should take to improve the work environment. The system uses a generative AI model to provide specific improvement suggestions based on the prompt. This output includes improvement actions that the user can actually implement.

[0696] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0697] This invention is a system that combines an emotion engine that not only detects potential problems in workplace communication but also recognizes the user's emotional state. In addition to conventional problem detection functions, this system aims to identify psychological risks in the workplace and provide comprehensive improvement measures.

[0698] First, the server collects voice and text data from within the workplace in real time. This includes conversations in meeting rooms and offices, which are then converted into text data using speech recognition technology. This transcribed data is then subjected to subsequent analysis processing.

[0699] Next, the server applies natural language processing to the text data to detect signs of labor law violations and harassment. In addition, it utilizes an emotion engine to analyze the emotional state associated with each user's statements. The emotion engine identifies emotions such as joy, anger, and sadness from word choices and context, and adds this information to the analysis results.

[0700] If the analysis detects a problem, the server generates an alert and notifies the relevant administrators and personnel on their terminals. The alert includes details of the identified problem and the detected emotional tendencies. This allows for a multifaceted assessment of the situation and enables a quick and appropriate response.

[0701] Furthermore, the server integrates with project management tools to monitor emotional and communication risks that could impact project progress. Based on this information, it proposes improvements to the project. For example, if a team member is identified as being under high stress, suggestions may be made to re-evaluate the project member's role or adjust their workload.

[0702] Based on the information displayed on the terminal, users can make appropriate decisions tailored to their individual circumstances and take necessary measures. In this way, the main objective of the present invention is to support improvements not only to the physical but also to the psychological environment of the workplace, thereby improving employee health and corporate efficiency.

[0703] The following describes the processing flow.

[0704] Step 1:

[0705] The server acquires audio data in real time from voice collection devices installed within the workplace. This data, including meetings and everyday conversations, is securely stored for recording purposes.

[0706] Step 2:

[0707] The server converts the acquired audio data into text using a speech recognition engine. Here, it prioritizes transcribing the clearest parts of the audio and performs filtering to minimize transcription errors.

[0708] Step 3:

[0709] The server analyzes the transcribed data using natural language processing algorithms to detect signs of labor law violations and harassment. It also uses an emotion engine to identify the emotions contained in the statements and generate an emotion profile.

[0710] Step 4:

[0711] The server generates alerts based on detected issues and emotional states. These alerts include the type of issue identified, the emotional tendency, and related contextual information.

[0712] Step 5:

[0713] The server notifies relevant parties or administrators of the generated alerts in real time. The notifications are delivered via push notifications, prompting immediate action based on the level of importance.

[0714] Step 6:

[0715] The server integrates with project management tools to monitor emotional and communication risks in project progress and resource allocation. This includes regular data updates and real-time insights.

[0716] Step 7:

[0717] Based on the analyzed sentiment and risk data, the server proposes improvements and adjustments. These proposals may include recommendations for reviewing team composition or adjusting schedules.

[0718] Step 8:

[0719] Users review received alerts and suggestions and determine the best course of action based on them. Users can provide appropriate feedback and mental support in accordance with organizational policies.

[0720] (Example 2)

[0721] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0722] In today's workplace, it is essential to detect labor law violations and signs of harassment early and respond appropriately. However, these problems often do not manifest themselves overtly, and in many cases, they have underlying, complex emotional roots. Traditional methods have made it difficult to accurately identify these psychological risks and respond quickly. In addition, there is insufficient understanding and countermeasures for communication risks that affect project progress, which can ultimately reduce workplace efficiency.

[0723] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0724] In this invention, the server includes means for collecting acoustic information in real time and converting it into text data; means for analyzing the text data and detecting signs of regulatory violations or misconduct; means for analyzing an individual's emotional state and identifying psychological risks; and means for monitoring communication and progress risks in conjunction with a work management tool. This makes it possible to analyze potential workplace problems from multiple perspectives and propose appropriate improvement measures.

[0725] "Acoustic information" refers to data related to speech and other sounds that is collected and processed by a system.

[0726] "Text data" refers to data in text format converted from acoustic information, and is the information that will be analyzed.

[0727] "Analysis" refers to the process of data processing carried out to detect signs of regulatory violations or misconduct based on collected data.

[0728] "Indicators of regulatory violations or misconduct" refers to indications of actions or statements that may violate labor laws or other norms.

[0729] "Warning information" refers to alerts generated to notify about detected problems.

[0730] A "task management tool" refers to a software system used to manage projects and their progress.

[0731] "Communication and progress risks" refer to communication problems and risks that may affect the progress of the project.

[0732] "Emotional state" refers to an individual's psychological or emotional state, which is analyzed from their words and actions.

[0733] "Psychological risk" refers to problems in the workplace that may affect an individual's mental health.

[0734] This invention is a system that effectively collects and analyzes acoustic information in the workplace environment, enabling early detection of signs of regulatory violations and misconduct, as well as the evaluation of individuals' emotional states and the identification of psychological risks. At the core of this system is the comprehensive data processing capability achieved through the cooperation of servers and terminals.

[0735] The server first collects real-time acoustic information from the workplace through acoustic sensors and microphones. This function is achieved using microphone devices connected via the network and dedicated speech recognition devices. Specifically, speech recognition software such as Google Speech-to-Text API and IBM Watson Speech to Text is used to instantly convert the collected acoustic information into text data. This text data is stored on the server as the basis for analysis.

[0736] Next, the server runs a program to perform natural language analysis based on the stored text data. This analysis uses advanced natural language processing techniques such as Python's NLTK library and spaCy. During the analysis, the server detects signs of labor law violations and harassment and generates warning information as needed. The analysis also incorporates an emotion engine that calculates an individual's emotional state from the context of the conversation and word choice, and identifies psychological risks.

[0737] Furthermore, the server monitors project progress in conjunction with work management tools. Specifically, it uses APIs to connect data with project management software such as JIRA and Trello to detect and analyze communication and progress risks. Based on this data, the system proposes appropriate improvement measures to administrators. For example, if a particular team member is detected to be under high stress, it can suggest redistributing tasks.

[0738] The user receives information provided by the server through the terminal and takes appropriate action based on it. The terminal is equipped with a user interface for visualizing data analysis results and suggested improvement measures, allowing the user to quickly implement countermeasures.

[0739] For example, the effectiveness of the system can be further enhanced by utilizing a generative AI model with prompts such as, "How can I detect emotions and problems from workplace audio data?" In this way, the system aims to support improvements in the physical and psychological environment of the workplace, thereby improving employee health and corporate efficiency.

[0740] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0741] Step 1:

[0742] The server collects acoustic information in real time from acoustic sensors and microphones. The input is audio data from conference rooms or offices, and the output is stored in a database as acoustic information. In this process, noise cancellation technology is used to remove ambient noise and capture conversations more clearly.

[0743] Step 2:

[0744] The server converts the collected acoustic information into text data using speech recognition software. The input is acoustic information, and the output is text data. Specifically, when converting speech to text using the Google Speech-to-Text API, the server utilizes speaker identification to classify each utterance.

[0745] Step 3:

[0746] The server applies natural language processing to the converted text data to detect signs of regulatory violations or misconduct. The input is text data, and the output is the analysis result regarding the presence or absence of problems. It also utilizes an emotion engine to analyze the emotional state contained in each statement, identifying emotions such as joy, anger, and sadness. The data processing performed at this stage involves analyzing the text sentence by sentence and measuring the frequency of keywords and phrases.

[0747] Step 4:

[0748] The server generates warning information based on the analysis results and notifies the terminals of relevant administrators and personnel. The input is the analysis results, and the output is the alert notification. Here, notifications are sent via email or a dedicated app using a notification system, and delivered to relevant parties in different ways depending on the urgency.

[0749] Step 5:

[0750] The server works in conjunction with the work management tool to continuously monitor the risks in the project's progress and propose improvement measures. The input is the latest progress information obtained from the project management tool, and the output is a risk assessment and improvement suggestions. Specifically, it periodically synchronizes the project status using APIs such as JIRA, visualizes the data, and provides it as information that users can easily understand and use.

[0751] Step 6:

[0752] The user takes appropriate action based on the warning information and improvement suggestions displayed on the device. The input is notification information received from the server, and the output is the user's implementation of the corrective action. In this process, the user interface receives data feedback, presents recommended actions, and supports specific action procedures tailored to the situation.

[0753] (Application Example 2)

[0754] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0755] Traditional workplace management systems can detect labor law violations and signs of harassment, but they struggle to recognize employees' emotional states in real time and utilize that information to provide comprehensive environmental improvement measures. Furthermore, they lacked the ability to comprehensively assess and respond quickly to communication risks and emotional aspects related to project progress, resulting in insufficient improvement of the psychological workplace environment.

[0756] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0757] In this invention, the server includes means for collecting voice information in real time and converting it into text data, means for analyzing the text data and identifying signs of legal violations or harassment, and means for recognizing the user's emotional state using an emotion analysis engine. This makes it possible to comprehensively monitor psychological and legal risks in the workplace and provide appropriate countermeasures.

[0758] "Auditory information" refers to information conveyed through sound, and is typically recorded as data that includes human speech.

[0759] "Text data" refers to digital data that represents audio information as text, conveying information in a format that is easy for people to understand.

[0760] "Analysis" is the process of examining data in detail, identifying features and patterns, and deriving specific results.

[0761] "Violation of laws and regulations" refers to actions that violate the provisions of laws or regulations, and such actions may result in legal sanctions.

[0762] "Signs of harassment" refer to early signs that bullying or harassment may be occurring, and situations that should be addressed based on these signs.

[0763] An "emotion analysis engine" is a software process that identifies a user's emotions from text or audio, and has the function of automatically classifying their emotional state.

[0764] "Users" refers to individuals or groups who use the system or service, and in this context specifically refers to employees of a workplace.

[0765] A "warning" is a notification issued to alert people to potential dangers or inappropriate behavior, and is intended to encourage a prompt response.

[0766] "Notification" refers to the act of sending messages or alerts used to inform users of specific information.

[0767] A "project management system" is a digital tool or platform for tracking and managing the planning, progress, and outcomes of a project.

[0768] "Communication risk" refers to a situation where misunderstandings or communication errors may occur during information exchange, potentially hindering the progress of the project.

[0769] "Risks in the project progress" refers to situations where the project may not proceed as planned, and the results may not be achieved as intended.

[0770] An "improvement proposal" is a suggestion that identifies a problem and then outlines specific methods or measures to solve it.

[0771] The system for implementing this invention collects voice information in the workplace in real time, recognizes the emotional state of employees using an emotion analysis engine, and analyzes signs of legal violations or harassment. Specifically, these functions are realized by a server-centered system.

[0772] The server collects audio information from within the workplace in real time via microphones and audio devices. This audio information is converted into text data using speech recognition technologies such as Google Cloud Speech-to-Text and IBM Watson Speech-to-Text.

[0773] Next, the server performs natural language processing on this text data. Using natural language processing tools such as AWS Comprehend and Google Cloud Natural Language, it identifies signs of legal violations and harassment within the text and analyzes the user's emotional state. This includes automatically categorizing emotions such as joy, anger, and sadness.

[0774] When notifying users of necessary information on their devices, the system also notifies administrators of specific problems or changes in emotional state on their devices. This allows administrators to monitor employees' psychological health and take appropriate action.

[0775] As a concrete example, consider a scenario where, during a weekly meeting, an emotional analysis of project team members reveals that a particular member is experiencing extremely high stress levels. The system then alerts the administrator, who can quickly address the issue by reviewing their workload.

[0776] An example of a prompt using a generative AI model is, "Analyze the emotional state of the participants during the meeting and report the stress index." Using this prompt helps to facilitate the smooth progress of the emotion recognition process.

[0777] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0778] Step 1:

[0779] The server collects audio information in real time via audio devices within the workplace. The input for this step is the audio signal acquired from the microphone, and the output is digitized audio data. This data is stored on the server for further processing.

[0780] Step 2:

[0781] The server converts the collected audio data into text data using speech recognition technology such as Google Cloud Speech-to-Text. The input for this step is digitized audio data, and the output is text data. The speech recognition algorithm records the content of the audio as text information.

[0782] Step 3:

[0783] The server analyzes the text data using natural language processing tools such as AWS Comprehend. The input for this step is data expressed in text format, and the output is the analyzed information. Specifically, the analysis is performed to identify indicators of emotional state and signs of legal violations.

[0784] Step 4:

[0785] The server generates and sends an alert to the administrator's terminal based on the analyzed information. The input for this step is the information obtained as a result of the analysis, and the output is the alert message sent to the administrator's terminal. This allows the administrator to immediately grasp any abnormalities in the work environment.

[0786] Step 5:

[0787] The user reviews the alerts and analysis information received via the terminal and takes necessary actions. The input for this step is the notification information displayed on the terminal, and the output is the user's response and feedback. Based on the suggested improvements, the user reviews their work processes and adjusts their work environment.

[0788] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0789] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0790] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0791] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0792] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0793] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0794] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0795] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0796] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0797] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0798] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0799] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0800] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0801] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0802] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0803] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0804] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0805] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0806] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0807] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0808] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0809] The following is further disclosed regarding the embodiments described above.

[0810] (Claim 1)

[0811] A method for collecting audio data in real time and converting it to text,

[0812] A means for analyzing the aforementioned transcribed data to detect signs of labor law violations or harassment,

[0813] A means for generating and notifying alerts based on the detected information,

[0814] A means of monitoring communication and schedule risks in conjunction with project management tools,

[0815] A means of proposing improvement measures based on the aforementioned risks,

[0816] A system that includes this.

[0817] (Claim 2)

[0818] The system according to claim 1, wherein the means for converting the aforementioned audio data into text is speech recognition technology.

[0819] (Claim 3)

[0820] The analysis means is the system according to claim 1, which analyzes text data using natural language processing.

[0821] "Example 1"

[0822] (Claim 1)

[0823] A means of acquiring audio information in real time and converting it into text,

[0824] A means for analyzing the stringified information and detecting signs of legal violations or disruptive behavior,

[0825] A means for creating and notifying a warning based on the detected content,

[0826] A means of monitoring communication and planning risks in conjunction with the business management system,

[0827] A means of presenting improvement proposals based on the aforementioned risks,

[0828] A system that includes this.

[0829] (Claim 2)

[0830] The system according to claim 1, wherein the means for converting the aforementioned audio information into a string is an acoustic conversion technique.

[0831] (Claim 3)

[0832] The analysis means is the system according to claim 1, which analyzes string data using natural language processing.

[0833] "Application Example 1"

[0834] (Claim 1)

[0835] A means of acquiring audio information in real time and converting it into text information,

[0836] A means for analyzing the converted text information and detecting signs of human rights or ethical issues,

[0837] A means for generating and notifying a warning based on the detected information,

[0838] A means of monitoring information dissemination and plan risks in conjunction with progress management tools,

[0839] A means of proposing improvement measures based on the aforementioned risks,

[0840] Means for notifying the terminal of the aforementioned warnings and suggestions and saving them as history,

[0841] A system that includes this.

[0842] (Claim 2)

[0843] The system according to claim 1, wherein the means for converting the aforementioned audio information into text information is a speech recognition method.

[0844] (Claim 3)

[0845] The system according to claim 1, wherein the analysis means analyzes character information using natural language processing.

[0846] "Example 2 of combining an emotion engine"

[0847] (Claim 1)

[0848] A means of collecting acoustic information in real time and converting it into text data,

[0849] A means for analyzing the aforementioned text data and detecting signs of regulatory violations or misconduct,

[0850] A means for generating and transmitting warning information based on the detected information,

[0851] A means of monitoring communication and progress risks in conjunction with work management tools,

[0852] A means of proposing improvement measures based on the aforementioned risks,

[0853] A means of analyzing an individual's emotional state and identifying psychological risks,

[0854] A system that includes this.

[0855] (Claim 2)

[0856] The means for converting the aforementioned acoustic information into text data is a system according to claim 1, which uses speech understanding technology.

[0857] (Claim 3)

[0858] The system according to claim 1, wherein the analysis means analyzes character data using natural language processing.

[0859] "Application example 2 of combining emotional engines"

[0860] (Claim 1)

[0861] A means of collecting audio information in real time and converting it into text data,

[0862] A means of analyzing the aforementioned text data to identify signs of legal violations or harassment,

[0863] A means for generating and notifying an alarm based on the identified information,

[0864] A means of monitoring communication and progress planning risks in conjunction with a project management system,

[0865] Means for making improvement proposals based on the aforementioned risks,

[0866] A means of recognizing the user's emotional state using an emotion analysis engine,

[0867] A means of providing comprehensive environmental improvement measures based on perceived emotional states and identified risks,

[0868] A system that includes this.

[0869] (Claim 2)

[0870] The system according to claim 1, wherein the means for converting the aforementioned audio information into text data utilizes speech recognition technology.

[0871] (Claim 3)

[0872] The system according to claim 1, wherein the analysis means analyzes text data using natural language processing and recognizes an emotional state. [Explanation of Symbols]

[0873] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A method for collecting audio data in real time and converting it to text, A means for analyzing the aforementioned transcribed data to detect signs of labor law violations or harassment, A means for generating and notifying alerts based on the detected information, A means of monitoring communication and schedule risks in conjunction with project management tools, A means of proposing improvement measures based on the aforementioned risks, A system that includes this.

2. The system according to claim 1, wherein the means for converting the aforementioned audio data into text is speech recognition technology.

3. The analysis means is the system according to claim 1, which analyzes text data using natural language processing.