system

JP2026105510APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-16
Publication Date: 2026-06-26

Application Information

Patent Timeline

16 Dec 2024

Application

26 Jun 2026

Publication

JP2026105510A

IPC: G06F3/16; G10L15/22; G10L15/00; G10L15/10

AI Tagging

Technology Topics

External dataData acquisition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Spatial planning data quality inspection methods, equipment, and media based on embedded databases
CN122309498AData packEmbedded database
A warehouse management method and system based on real-time inventory status
CN122264695ASolve technical problems that are insufficient in practiceAccurately reflect actual salable capacityDigital data information retrieval Natural language data processing Data streamExternal data
A method based on data security one-way transmission and physical isolation
CN122293399Aavoid connectionachieve physical isolationExternal dataEngineering
system
JP2026101162ANatural language translation Semantic analysis Engineering Processing
Systems and methods for secure exchange of goods, intelligent monitoring, and remote control of package receptacles and surroundings
US20260174261A1Kitchen equipment Domestic articlesVideo sensorsExternal data

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 2026105510000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] A data acquisition method for obtaining user language data, A communication means for transmitting acquired language data to an external data processing device, A detection means that analyzes received language data and detects negative content, A notification mechanism that provides users with warnings, including suggestions for improvement, based on the detected results. A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] There is a need to reduce the impact of inappropriate behavior in society, especially inappropriate remarks and actions that are unconsciously made, on people's minds. Inappropriate behavior becomes a serious problem in the workplace, educational institutions, etc., causing not only psychological pain to victims but also adverse effects on productivity. With conventional methods, it is difficult to detect these behaviors in real time and prevent them beforehand. Therefore, there is a need for a system that automatically and immediately detects inappropriate behavior and supports problem-solving.

Means for Solving the Problems

[0005] This invention provides a system that monitors user speech and input data in real time and automatically detects inappropriate behavior. A data acquisition means collects user communication data and transmits it to a remote processing unit using a communication means. The processing unit identifies inappropriate behavior using a detection means that utilizes natural language processing technology. Based on the results, a warning is issued to the user via a notification means, prompting them to appropriately correct their behavior. In this way, unintentional inappropriate behavior is prevented, and healthy communication is supported.

[0006] "Data acquisition means" refers to a configuration that has the function of monitoring user speech or input and collecting information.

[0007] "Communication means" refers to a configuration that has the function of transmitting collected data to other devices or systems.

[0008] A "processing device" is a computer system that analyzes received data and processes it according to a specific purpose.

[0009] A "detection means" is a configuration that has the function of recognizing and detecting specific patterns or improper acts from collected data.

[0010] "Natural language processing technology" refers to the techniques and methods used by computers to understand and process human language.

[0011] A "notification means" is a configuration that has the function of providing information or warnings to the user based on the detection results.

[0012] A "user interface" is a configuration that includes screen displays and input / output means for a system and a user to exchange information. [Brief explanation of the drawing]

[0013] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2]It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Best Mode for Carrying Out the Invention

[0014] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0019] In the following embodiments, a numbered communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] This invention relates to a system that monitors user communication content in real time and automatically detects inappropriate remarks or actions. This system functions through collaboration between a terminal used by the user and a server that performs data analysis.

[0035] The device constantly monitors the user's speech and input text in the background and collects this information using data acquisition methods. The collected data is securely transmitted to a server using communication methods. In the case of voice input, the device utilizes speech recognition technology to convert it into text data in real time.

[0036] The server analyzes the received text data. Using natural language processing techniques, it tokenizes and parses the data to detect patterns of harassment and inappropriate remarks. To perform this analysis, the server is equipped with a pre-trained AI model that assesses the risk by comparing it to past cases.

[0037] If a risk is detected, the server immediately sends the result to the terminal via a notification system. The terminal receives this and issues a warning to the user through the user interface. The warning may include specific examples of inappropriate remarks and suggestions for improvement. Based on this information, the user can review their own words and actions and communicate appropriately.

[0038] A concrete example is an online meeting at work. If a participant says, "That idea is completely worthless," during the meeting, the device transcribes the speech in real time and sends the information to the server. The server detects this expression as inappropriate and sends a warning message back to the device. The user then receives a notification on their device suggesting an improvement, such as, "We'll take that opinion into consideration, but let's think about an alternative approach," allowing them to reconsider their statement.

[0039] This allows the system to support users in achieving socially desirable communication.

[0040] The following describes the processing flow.

[0041] Step 1:

[0042] The device constantly monitors the user's voice or text input. In the case of voice input, it uses speech recognition technology to convert it into text data in real time. It also preprocesses the text data to remove noise and unnecessary information.

[0043] Step 2:

[0044] The terminal prepares the acquired text data and sends it to the server via a communication protocol. Encryption technology is used during this process to ensure the security and low latency of the data transfer.

[0045] Step 3:

[0046] The server receives text data from the terminal and prepares it for analysis. The data is then passed to the natural language processing engine, where tokenization and syntactic analysis are performed.

[0047] Step 4:

[0048] The AI model on the server detects patterns related to moral harassment and inappropriate remarks based on the analysis results. The detection is performed by comparing the results with a pre-trained dataset of past data.

[0049] Step 5:

[0050] If the server detects a potential instance of harassment, it generates a warning message. This message includes specific examples of inappropriate remarks and suggestions for improvement.

[0051] Step 6:

[0052] The server sends the generated warning message to the terminal. Immediacy is paramount, and communication is carried out as quickly as possible.

[0053] Step 7:

[0054] The terminal receives warning messages from the server and notifies the user through the user interface. The notification is accompanied by a pop-up or alert sound to make it easy for the user to notice.

[0055] Step 8:

[0056] Users can check notifications from their devices and have the opportunity to review their statements and actions. If necessary, they can modify their communication according to the improvement suggestions provided by the system.

[0057] (Example 1)

[0058] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0059] Traditionally, there has been no effective system for detecting inappropriate content in user statements or entered text information in real time and immediately issuing warnings to users. Therefore, even when a user's communication is socially undesirable, improvement may be delayed, potentially leading to the problem escalating. This invention aims to solve such problems and support users in immediately striving for appropriate communication.

[0060] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0061] In this invention, the server includes data collection means, data transmission means, detection means, and information presentation means. This makes it possible to identify fraudulent activity in real time from the user's voice or text information and immediately issue a warning.

[0062] "Data collection means" refers to devices or programs that have the function of acquiring user voice and text information and generating information for analysis within the system.

[0063] "Data transmission means" refers to communication functions and technologies for safely and efficiently transmitting collected information to remotely located information processing devices.

[0064] A "detection means" is a device or program that has the function of analyzing data using language analysis technology in order to identify inappropriate speech or misconduct from received information.

[0065] An "information processing device" is a general term for hardware and software used to analyze information sent from data collection devices and to actually make decisions and perform analyses.

[0066] "Information presentation means" refers to devices or programs that provide users with analysis results or warning information visually or audibly through a user interface.

[0067] This invention is a system that monitors user communication in real time and automatically detects inappropriate remarks and actions. Its functionality is primarily achieved through collaboration between a terminal and a server.

[0068] The device constantly monitors the user's speech and text input. Specifically, it uses the microphone built into the device and speech recognition software to instantly convert the user's voice into text data. A commercially available speech recognition API is used for smooth conversion. This text data is then sent to the server using a data transmission method. Security is ensured by using the secure HTTP communication protocol and appropriately encrypting the data during transmission.

[0069] When the server receives text data sent from a terminal, it analyzes its content using natural language processing techniques. This analysis utilizes generative AI models built with frameworks such as TENSORFLOW® and PyTorch. These AI models evaluate the data in detail by performing tokenization and syntactic analysis, and detect risks by comparing it with past inappropriate speech patterns.

[0070] When the server detects a risk, it immediately sends a warning message to the device. This message is presented visually to the user through the device's user interface. The notification may include examples of detected inappropriate remarks and suggestions on how to improve them. This allows users to review their communication style and take socially appropriate actions.

[0071] As a concrete example, consider a scenario where a participant in an online meeting system says, "That opinion is completely useless." The device transcribes this statement in real time and immediately sends the information to the server. The server detects this expression as inappropriate and sends a warning message back to the device along with suggestions for improvement. The user receives a suggestion to "consider that opinion as well, and then look at it from a different perspective," allowing them to reconsider their statement.

[0072] An example of such a prompt message is, "Inappropriate remarks were detected during the online meeting. Please suggest ways to improve." This allows users to strive for better communication.

[0073] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0074] Step 1:

[0075] The device monitors the user's voice and text input. When the user begins speaking, the device collects voice data through its built-in microphone. This data is input into speech recognition software and converted into text data in real time. The output text data is then formatted into a format that allows for further analysis within the system.

[0076] Step 2:

[0077] The terminal sends the obtained text data to the server. The terminal transmits the text data to the server via a secure protocol such as HTTPS. The input text data is encrypted and securely transmitted through relay processes before reaching the server.

[0078] Step 3:

[0079] The server analyzes the text data received from the terminal. Using the received text data as input, it performs tokenization and syntactic analysis using natural language processing techniques. This analysis process breaks down the transmitted text into individual words and determines their meaning. Based on this, the server utilizes a generative AI model to assess the risk of inappropriate remarks. The output is the analysis result.

[0080] Step 4:

[0081] The server generates a warning if inappropriate remarks are detected as a result of the risk assessment. Specifically, a generation AI model evaluates the risks in the remarks, and if they exceed a threshold, it creates a warning message that includes suggestions for improvement. This message is prepared in the form of a prompt and sent to the terminal.

[0082] Step 5:

[0083] The terminal receives warning messages from the server and notifies the user through the user interface. It decodes the received messages and presents them visually to the user. The terminal presents the user with specific examples of inappropriate remarks and suggestions for improvement, allowing the user to modify their behavior accordingly.

[0084] (Application Example 1)

[0085] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0086] In today's communication environment, inappropriate remarks and negative expressions can negatively impact interpersonal relationships, which is a significant problem. This invention aims to provide a system that supports healthy communication by detecting such negative language expressions in real time and suggesting appropriate improvements.

[0087] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0088] In this invention, the server includes data acquisition means for acquiring user language data, communication means for transmitting the acquired language data to an external data processing device, detection means for analyzing the received language data and detecting negative content, and notification means for providing the user with warnings, including suggestions for improvement, based on the detected results. This makes it possible to monitor user statements and input content in real time and correct negative expressions immediately.

[0089] "User language data" refers to the utterances and input content obtained from the user as speech or text.

[0090] "Data acquisition means" refers to a function that collects user language data and prepares it for processing.

[0091] "Communication means" refers to the technology and methods used to transmit acquired data to a data processing device located at a remote location.

[0092] "Detection means" refers to a function that analyzes received data and identifies negative or inappropriate content from it.

[0093] "Notification means" refers to technologies and methods used to communicate warnings and suggestions for improvement to users based on detected results.

[0094] "External data processing device" refers to a computer system or server used to receive and analyze data.

[0095] "Improvement suggestions" refer to specific advice and guidance provided to users to promote more appropriate and constructive communication when negative or inappropriate language is detected.

[0096] To implement this invention, it is necessary to install an application on the user's terminal for speech recognition and natural language processing. The terminal constantly monitors the user's speech and text input in the background and acquires language data. This data is transmitted to an external data processing device (server) via communication means.

[0097] The server analyzes the received data using natural language processing techniques. Audio data is converted to text data using speech recognition software. Tokenization, syntactic analysis, and sentiment analysis are applied to the text data. Sentiment analysis uses a pre-trained generative AI model to detect negative or inappropriate content.

[0098] If an inappropriate statement is detected according to pre-set criteria, the server will notify the terminal of the result. This notification will include a warning about the statement and suggestions for improvement via the user interface. This allows the user to review the content of their communication in real time and correct it to be more appropriate.

[0099] For example, if the phrase "This idea is unusable" is detected during an online company meeting, the server sends a notification to the participant's device that includes a suggestion for improvement, such as "Let's think of a more flexible idea." As a result, meeting participants can reflect on their own statements and promote constructive discussion.

[0100] Examples of prompts to input into the generative AI model include, "Detect negative expressions in the text and provide suggestions for making them more positive," and "Analyze the audio data and generate warning messages about potentially harassing expressions."

[0101] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0102] Step 1:

[0103] The device monitors the user's voice and text input in real time. In the case of voice, it uses a microphone to acquire voice data and converts it into text data using speech recognition technology. The input is the user's speech or text, and the output is language data in text format. Specifically, the device captures voice using its microphone and converts it into text.

[0104] Step 2:

[0105] The terminal securely transmits the acquired text data to the server via a communication method. The input is text-based language data, and the output is the completion of data transmission to the server. Specifically, the data is transferred to the server via a network protocol.

[0106] Step 3:

[0107] The server analyzes the received text data. Using natural language processing techniques, it performs tokenization and syntactic analysis, and then uses a generative AI model to perform sentiment analysis. The input is text data, and the output is the analysis result, including the detection of sentiment and negative statements. Specific operations include referencing a database and comparing it with past cases.

[0108] Step 4:

[0109] If the server determines that the analysis results are negative or inappropriate, it generates specific improvement suggestions. It inputs prompt statements into an AI model to create improvement suggestions. The input is the detection results for negative statements, and the output is a suggestion statement for the user. In essence, the AI is provided with prompts to generate improvement suggestions.

[0110] Step 5:

[0111] The server sends a warning message containing the generated improvement suggestions to the terminal via a communication method. The input is the suggestion text, and the output is the completion of notification to the terminal. The specific action here is the process of sending data over the network.

[0112] Step 6:

[0113] The terminal displays received suggestions to the user via a user interface. The input is notification data from the server, and the output is a visible warning message to the user. Specifically, the notification pops up on the screen, allowing the user to immediately check it.

[0114] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0115] This invention relates to a system that achieves more precise communication analysis by evaluating not only the user's utterances and input content, but also their emotional state. This system consists of data acquisition means, an emotion engine, communication means, detection means, and notification means, which work together to perform their functions.

[0116] The device constantly monitors the user's speech and text input, collecting this data using data acquisition methods. In the case of speech data, this process includes converting it into text data using speech recognition technology. The emotion engine determines the user's emotional state based on the collected data. It analyzes tone from speech and context and style from text to infer emotions such as anger or sadness.

[0117] The collected and analyzed data is transmitted to a server via communication means. After receiving the data, the server uses natural language processing technology to consider both the content of the utterance and the emotional state to detect potential misconduct. Through this dual analysis, the detection means can detect even subtle nuances of misconduct that could not be captured by conventional text analysis alone.

[0118] If the detection results indicate that a user's statements constitute harassment or other inappropriate behavior, the server will send a warning message to the device via a notification system. Based on the user's emotional state, as assessed by the emotion engine, the notification content will be customized to be easily received by the user. For example, if the user is angry, the notification will be expressed in a calmer, more composed tone.

[0119] As a concrete example, consider a scenario in an online team meeting where a member says, "Your plan is completely unrealistic." In this case, the device transcribes the statement into text, and if the emotion engine detects an angry tone, that information is sent to the server. The server detects the statement as inappropriate but also generates an improvement message that takes the speaker's emotions into consideration. The device then presents the user with an improvement suggestion in a calm tone, such as "Let's consider another proposal," to help prevent the atmosphere from worsening.

[0120] In this way, the present invention provides a system that comprehensively analyzes the user's language and emotional information and supports the promotion of effective communication.

[0121] The following describes the processing flow.

[0122] Step 1:

[0123] The terminal monitors the user's voice and text input in real time. This data is acquired through data acquisition means, and voice input is converted into text data by a speech recognition function.

[0124] Step 2:

[0125] The device sends the acquired data to an emotion engine, which analyzes the user's emotional state. This engine determines emotions based on factors such as voice tone and text context. This process then evaluates the user's emotional state using numerical values and labels.

[0126] Step 3:

[0127] The terminal packages the analyzed sentiment data and text data and sends it to the server using a secure communication method. This communication takes place in real time, and encryption technology is used to maintain the confidentiality of the transmitted data.

[0128] Step 4:

[0129] The server receives the data for analysis. Using natural language processing techniques, it extracts patterns of misconduct from the text data and simultaneously refers to sentiment data to determine the nuances of the speech.

[0130] Step 5:

[0131] If the server detects harassment or other inappropriate behavior, it will generate a warning message on the device via a notification system. This message will be tailored and customized appropriately based on the user's emotional state.

[0132] Step 6:

[0133] The server sends the generated warning message to the terminal. This communication is also conducted with an emphasis on security and low latency.

[0134] Step 7:

[0135] The terminal receives warning messages from the server and notifies the user through the user interface. Notifications are delivered via pop-ups and alert sounds, and are presented to the user in an emotionally sensitive manner.

[0136] Step 8:

[0137] Users receive notifications on their devices, giving them an opportunity to review their own statements and actions. If necessary, they can follow the suggested improvements and modify their communication to make it better.

[0138] (Example 2)

[0139] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0140] Traditional communication analysis systems detect inappropriate behavior based solely on the content of user statements, failing to capture subtle inappropriate actions that include emotional nuances. Furthermore, warnings lacked flexibility because they did not consider the user's emotional state.

[0141] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0142] In this invention, the server includes means for monitoring and acquiring information, means for transforming the acquired information, and means for analyzing the transformed information and determining its state. This makes it possible to comprehensively analyze the user's statements and emotional state, generate appropriate warnings, and send notifications.

[0143] A "device for monitoring and acquiring information" is a device that continuously detects user speech and text input and collects that data.

[0144] A "device for converting acquired information" is a device for appropriately converting audio data into text data or other formats.

[0145] A "device that analyzes converted information and determines the state" is a device that analyzes the user's emotions and intentions based on the converted data and identifies the corresponding emotional state.

[0146] A "device that detects behavior based on received data" is a device that uses analyzed data to identify inappropriate behavior or potential problematic behaviors of users.

[0147] A "device that notifies users" is a device that provides warnings and advice to users based on detected problems.

[0148] An "information display device" is a device that presents information to a user visually or audibly.

[0149] This invention provides a system that enables sophisticated communication analysis by monitoring user statements and inputs and evaluating their content and emotional state. The following describes how this invention can be implemented.

[0150] The device constantly monitors the user's speech and text input. When the user inputs by voice, the device first acquires the voice data and converts it into text data using "speech recognition technology." This conversion uses commonly available speech recognition software.

[0151] All input data is then analyzed by an emotion engine. The emotion engine uses specific algorithms to determine the user's emotions from the input data. It analyzes tone from voice data and context and style from text data to infer emotions such as anger or sadness. Technologies used for analysis include "emotion analysis software."

[0152] The data analyzed by the device is sent to the server via communication means. The server receives this data and uses a "generative AI model" and other tools to comprehensively evaluate the user's statements and emotional state, and detects the possibility of inappropriate behavior. If inappropriate behavior is detected, an appropriate warning is sent to the device via a notification means.

[0153] As an example, consider an online meeting scenario. If a member says, "Your plan isn't realistic," and this statement is transcribed by the terminal, an emotion engine detects anger, this information is sent to the server. If the server detects it as an inappropriate statement, it generates a message saying, "We'll consider other suggestions," and delivers it to the user in a calm tone. In this way, the atmosphere of the meeting can be calmed.

[0154] This system can use the prompt "When the user is expressing strong emotions during a conversation, please generate and provide appropriate feedback based on those emotions and context" as a prompt for the generative AI model. This allows for more appropriate communication while taking into account the user's emotional responses.

[0155] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0156] Step 1:

[0157] The terminal acquires user speech and text input. If the user inputs by voice, this is collected as voice data. The input is voice waveform data, and the output is voice data as is. The voice data is temporarily held in memory in preparation for subsequent processing.

[0158] Step 2:

[0159] The device converts acquired audio data into text data using speech recognition technology. The input is audio data, and the output is text data in string format. This conversion uses an algorithm that analyzes audio patterns and converts them into corresponding linguistic expressions.

[0160] Step 3:

[0161] The device sends the converted text data to the emotion engine to determine the user's emotional state. The input is text data, and the output is metadata indicating the emotional state. Here, the context and word choices of the text are analyzed, and the intensity and type of emotion are quantified using an emotion analysis algorithm.

[0162] Step 4:

[0163] The terminal transmits the sentiment analysis results and text data to the server using a communication method. The input is text data and its sentiment state metadata, and the output is a data package transferred to the server. The data is encrypted using a communication protocol and securely transmitted to the server.

[0164] Step 5:

[0165] The server uses a generative AI model with the received data to detect misconduct. The input consists of text data and emotional state metadata, and the output is the result of the misconduct determination. The generative AI model uses natural language processing to comprehensively evaluate the input content and emotions.

[0166] Step 6:

[0167] The server generates a notification for the user based on the detection results. The input is the result of the misconduct determination, and the output is the message presented to the user. A generative AI model generates the suggested content and constructs the message in an appropriate tone.

[0168] Step 7:

[0169] The terminal displays notifications received from the server to the user. The input is a message from the server, and the output is a notification presented to the user visually or audibly. Notifications are delivered using a display or speaker in a way that the user can intuitively understand.

[0170] (Application Example 2)

[0171] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0172] In modern households, there is a need to mitigate discord caused by emotional misunderstandings and inappropriate behavior in interpersonal communication, and to promote smooth dialogue. However, there is insufficient technology to sense subtle emotional changes in communication in a timely manner and provide appropriate support. This invention aims to provide a system that can prevent potential problems, particularly in communication within the family.

[0173] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0174] In this invention, the server includes information acquisition means for monitoring user speech or input, communication means for transmitting the acquired information to a remote processing unit, and detection means for detecting inappropriate behavior from the received information in the processing unit. This makes it possible to analyze the content of conversations and improve emotional states as an automated device in the home.

[0175] "Information acquisition means" refers to a device or system that has the function of monitoring user speech and input and collecting that data.

[0176] "Communication means" refers to a device or system that has the function of transmitting collected information to a processing device located at a remote location.

[0177] A "processing device" is a device or system that has the function of performing analysis to detect inappropriate behavior based on received information.

[0178] A "detection method" is a system that uses natural language processing technology to analyze information and has the function of detecting emotional states and inappropriate behavior.

[0179] A "notification means" is a device or system that has the function of notifying the user of warnings or suggestions based on the detected results.

[0180] A "home-use automated device" is a device that analyzes everyday communication within the home and provides support to facilitate smooth dialogue.

[0181] The system designed to realize this application is an automated device for facilitating communication within the home. In this system, information acquisition means, communication means, processing means, detection means, and notification means work together. Specifically, it operates through the following process:

[0182] The terminal (home robot) constantly monitors the user's speech and text input through information acquisition methods. In the case of voice data, a microphone captures the voice, and a speech analysis API (for example, Google® Speech-to-Text) is used to convert the voice to text. The text data is then transmitted to a server via the internet using communication methods.

[0183] The server analyzes the received text data using a natural language processing library (e.g., NLTK) to assess the emotional state. This assessment includes analysis based on changes in voice tone and the context of the text. If inappropriate behavior is detected, the server notifies the terminal of this information.

[0184] Based on this information, the device sends appropriate warnings and suggestions to the user. For example, if the user shows signs of stress, the robot will offer a gentle suggestion such as, "Why don't you take a short break?" This helps to facilitate smoother communication within the household.

[0185] For example, if children start arguing during a game, the robot can instantly analyze the situation and suggest, "Why don't you take a break?" to alleviate the conflict. An example of a prompt for the generative AI model in such a scenario would be, "How can a household robot come up with a suggestion to calm the conversation during the game and communicate it?"

[0186] This system makes it possible to mitigate communication friction that often occurs within the family in real time, and to maintain good relationships.

[0187] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0188] Step 1:

[0189] The device captures the user's speech through the microphone. The input here is audio data. The device converts this audio data into text data using a speech analysis API. The converted text data is the output.

[0190] Step 2:

[0191] The terminal sends the converted text data to the server using a communication method. The output of this processing step is the text data sent to the server. The server takes the received text data as input for processing.

[0192] Step 3:

[0193] The server analyzes the received text data using a natural language processing library to evaluate the emotional state. For example, it identifies keywords and tones that indicate emotion within the text and determines the type of emotion. The input is the text data received by the server, and the output is the analyzed emotion data.

[0194] Step 4:

[0195] Based on the analysis results, the server generates appropriate warnings or suggestions if problems are detected. A generative AI model is applied, and its output is a message encouraging improvement. This is generated using prompts to the generative AI model. The input for this step is sentiment data, and the output is the generated suggestion message.

[0196] Step 5:

[0197] The server sends the generated suggestion message to the terminal. This sent message becomes input, and the terminal performs an action to notify the user via display or audio as output. The terminal provides the suggestion through its user interface, for example, using a display or speaker.

[0198] This processing flow makes it possible to support communication within the family and help maintain good relationships.

[0199] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0200] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0201] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0202] [Second Embodiment]

[0203] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0204] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0205] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0206] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0207] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0208] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0209] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0210] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0211] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0212] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0213] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0214] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0215] This invention relates to a system that monitors user communication content in real time and automatically detects inappropriate remarks or actions. This system functions through collaboration between a terminal used by the user and a server that performs data analysis.

[0216] The device constantly monitors the user's speech and input text in the background and collects this information using data acquisition methods. The collected data is securely transmitted to a server using communication methods. In the case of voice input, the device utilizes speech recognition technology to convert it into text data in real time.

[0217] The server analyzes the received text data. Using natural language processing techniques, it tokenizes and parses the data to detect patterns of harassment and inappropriate remarks. To perform this analysis, the server is equipped with a pre-trained AI model that assesses the risk by comparing it to past cases.

[0218] If a risk is detected, the server immediately sends the result to the terminal via a notification system. The terminal receives this and issues a warning to the user through the user interface. The warning may include specific examples of inappropriate remarks and suggestions for improvement. Based on this information, the user can review their own words and actions and communicate appropriately.

[0219] A concrete example is an online meeting at work. If a participant says, "That idea is completely worthless," during the meeting, the device transcribes the speech in real time and sends the information to the server. The server detects this expression as inappropriate and sends a warning message back to the device. The user then receives a notification on their device suggesting an improvement, such as, "We'll take that opinion into consideration, but let's think about an alternative approach," allowing them to reconsider their statement.

[0220] This allows the system to support users in achieving socially desirable communication.

[0221] The following describes the processing flow.

[0222] Step 1:

[0223] The device constantly monitors the user's voice or text input. In the case of voice input, it uses speech recognition technology to convert it into text data in real time. It also preprocesses the text data to remove noise and unnecessary information.

[0224] Step 2:

[0225] The terminal prepares the acquired text data and sends it to the server via a communication protocol. Encryption technology is used during this process to ensure the security and low latency of the data transfer.

[0226] Step 3:

[0227] The server receives text data from the terminal and prepares it for analysis. The data is then passed to the natural language processing engine, where tokenization and syntactic analysis are performed.

[0228] Step 4:

[0229] The AI model on the server detects patterns related to moral harassment and inappropriate remarks based on the analysis results. The detection is performed by comparing the results with a pre-trained dataset of past data.

[0230] Step 5:

[0231] If the server detects a potential instance of harassment, it generates a warning message. This message includes specific examples of inappropriate remarks and suggestions for improvement.

[0232] Step 6:

[0233] The server sends the generated warning message to the terminal. Immediacy is paramount, and communication is carried out as quickly as possible.

[0234] Step 7:

[0235] The terminal receives warning messages from the server and notifies the user through the user interface. The notification is accompanied by a pop-up or alert sound to make it easy for the user to notice.

[0236] Step 8:

[0237] Users can check notifications from their devices and have the opportunity to review their statements and actions. If necessary, they can modify their communication according to the improvement suggestions provided by the system.

[0238] (Example 1)

[0239] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0240] Traditionally, there has been no effective system for detecting inappropriate content in user statements or entered text information in real time and immediately issuing warnings to users. Therefore, even when a user's communication is socially undesirable, improvement may be delayed, potentially leading to the problem escalating. This invention aims to solve such problems and support users in immediately striving for appropriate communication.

[0241] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0242] In this invention, the server includes data collection means, data transmission means, detection means, and information presentation means. This makes it possible to identify fraudulent activity in real time from the user's voice or text information and immediately issue a warning.

[0243] "Data collection means" refers to devices or programs that have the function of acquiring user voice and text information and generating information for analysis within the system.

[0244] "Data transmission means" refers to communication functions and technologies for safely and efficiently transmitting collected information to remotely located information processing devices.

[0245] A "detection means" is a device or program that has the function of analyzing data using language analysis technology in order to identify inappropriate speech or misconduct from received information.

[0246] An "information processing device" is a general term for hardware and software used to analyze information sent from data collection devices and to actually make decisions and perform analyses.

[0247] "Information presentation means" refers to devices or programs that provide users with analysis results or warning information visually or audibly through a user interface.

[0248] This invention is a system that monitors user communication in real time and automatically detects inappropriate remarks and actions. Its functionality is primarily achieved through collaboration between a terminal and a server.

[0249] The device constantly monitors the user's speech and text input. Specifically, it uses the microphone built into the device and speech recognition software to instantly convert the user's voice into text data. A commercially available speech recognition API is used for smooth conversion. This text data is then sent to the server using a data transmission method. Security is ensured by using the secure HTTP communication protocol and appropriately encrypting the data during transmission.

[0250] When the server receives text data sent from a terminal, it analyzes its content using natural language processing techniques. This analysis utilizes generative AI models built with frameworks such as TensorFlow and PyTorch. These AI models evaluate the data in detail by performing tokenization and syntactic analysis, and detect risks by comparing it with past inappropriate speech patterns.

[0251] When the server detects a risk, it immediately sends a warning message to the device. This message is presented visually to the user through the device's user interface. The notification may include examples of detected inappropriate remarks and suggestions on how to improve them. This allows users to review their communication style and take socially appropriate actions.

[0252] As a concrete example, consider a scenario where a participant in an online meeting system says, "That opinion is completely useless." The device transcribes this statement in real time and immediately sends the information to the server. The server detects this expression as inappropriate and sends a warning message back to the device along with suggestions for improvement. The user receives a suggestion to "consider that opinion as well, and then look at it from a different perspective," allowing them to reconsider their statement.

[0253] An example of such a prompt message is, "Inappropriate remarks were detected during the online meeting. Please suggest ways to improve." This allows users to strive for better communication.

[0254] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0255] Step 1:

[0256] The device monitors the user's voice and text input. When the user begins speaking, the device collects voice data through its built-in microphone. This data is input into speech recognition software and converted into text data in real time. The output text data is then formatted into a format that allows for further analysis within the system.

[0257] Step 2:

[0258] The terminal sends the obtained text data to the server. The terminal transmits the text data to the server via a secure protocol such as HTTPS. The input text data is encrypted and securely transmitted through relay processes before reaching the server.

[0259] Step 3:

[0260] The server analyzes the text data received from the terminal. Using the received text data as input, it performs tokenization and syntactic analysis using natural language processing techniques. This analysis process breaks down the transmitted text into individual words and determines their meaning. Based on this, the server utilizes a generative AI model to assess the risk of inappropriate remarks. The output is the analysis result.

[0261] Step 4:

[0262] The server generates a warning if inappropriate remarks are detected as a result of the risk assessment. Specifically, a generation AI model evaluates the risks in the remarks, and if they exceed a threshold, it creates a warning message that includes suggestions for improvement. This message is prepared in the form of a prompt and sent to the terminal.

[0263] Step 5:

[0264] The terminal receives warning messages from the server and notifies the user through the user interface. It decodes the received messages and presents them visually to the user. The terminal presents the user with specific examples of inappropriate remarks and suggestions for improvement, allowing the user to modify their behavior accordingly.

[0265] (Application Example 1)

[0266] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0267] In today's communication environment, inappropriate remarks and negative expressions can negatively impact interpersonal relationships, which is a significant problem. This invention aims to provide a system that supports healthy communication by detecting such negative language expressions in real time and suggesting appropriate improvements.

[0268] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0269] In this invention, the server includes data acquisition means for acquiring user language data, communication means for transmitting the acquired language data to an external data processing device, detection means for analyzing the received language data and detecting negative content, and notification means for providing the user with warnings, including suggestions for improvement, based on the detected results. This makes it possible to monitor user statements and input content in real time and correct negative expressions immediately.

[0270] "User language data" refers to the utterances and input content obtained from the user as speech or text.

[0271] "Data acquisition means" refers to a function that collects user language data and prepares it for processing.

[0272] "Communication means" refers to the technology and methods used to transmit acquired data to a data processing device located at a remote location.

[0273] "Detection means" refers to a function that analyzes received data and identifies negative or inappropriate content from it.

[0274] "Notification means" refers to technologies and methods used to communicate warnings and suggestions for improvement to users based on detected results.

[0275] "External data processing device" refers to a computer system or server used to receive and analyze data.

[0276] "Improvement suggestions" refer to specific advice and guidance provided to users to promote more appropriate and constructive communication when negative or inappropriate language is detected.

[0277] To implement this invention, it is necessary to install an application on the user's terminal for speech recognition and natural language processing. The terminal constantly monitors the user's speech and text input in the background and acquires language data. This data is transmitted to an external data processing device (server) via communication means.

[0278] The server analyzes the received data using natural language processing techniques. Audio data is converted to text data using speech recognition software. Tokenization, syntactic analysis, and sentiment analysis are applied to the text data. Sentiment analysis uses a pre-trained generative AI model to detect negative or inappropriate content.

[0279] If an inappropriate statement is detected according to pre-set criteria, the server will notify the terminal of the result. This notification will include a warning about the statement and suggestions for improvement via the user interface. This allows the user to review the content of their communication in real time and correct it to be more appropriate.

[0280] For example, if the phrase "This idea is unusable" is detected during an online company meeting, the server sends a notification to the participant's device that includes a suggestion for improvement, such as "Let's think of a more flexible idea." As a result, meeting participants can reflect on their own statements and promote constructive discussion.

[0281] Examples of prompt sentences to be input into the generative AI model include "Please detect negative expressions in the text and provide suggestions to make it more positive" and "Please analyze the voice data and generate a warning message for expressions that may be harassing."

[0282] The flow of the specific process in Application Example 1 will be described using FIG. 12.

[0283] Step 1:

[0284] The terminal monitors the user's voice and text input in real time. In the case of voice, it uses a microphone to acquire voice data and converts it into text data using voice recognition technology. The input is the user's speech or text, and the output is language data in text format. The specific operation here is to capture the voice using the terminal's microphone and convert it into text.

[0285] Step 2:

[0286] The terminal securely transmits the acquired text data to the server via communication means. The input is language data in text format, and the output is the completion of data transmission to the server. As a specific operation, the data is transferred to the server via a network protocol.

[0287] Step 3:

[0288] The server analyzes the received text data. Using natural language processing technology, it performs tokenization and syntax analysis and conducts sentiment analysis with the generative AI model. The input is text data, and the output is the sentiment as an analysis result and the detection result of negative utterances. As a specific operation, it includes processing such as referring to a database and comparing with past cases.

[0289] Step 4:

[0290] If the server determines that the analysis results are negative or inappropriate, it generates specific improvement suggestions. It inputs prompt statements into an AI model to create improvement suggestions. The input is the detection results for negative statements, and the output is a suggestion statement for the user. In essence, the AI is provided with prompts to generate improvement suggestions.

[0291] Step 5:

[0292] The server sends a warning message containing the generated improvement suggestions to the terminal via a communication method. The input is the suggestion text, and the output is the completion of notification to the terminal. The specific action here is the process of sending data over the network.

[0293] Step 6:

[0294] The terminal displays received suggestions to the user via a user interface. The input is notification data from the server, and the output is a visible warning message to the user. Specifically, the notification pops up on the screen, allowing the user to immediately check it.

[0295] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0296] This invention relates to a system that achieves more precise communication analysis by evaluating not only the user's utterances and input content, but also their emotional state. This system consists of data acquisition means, an emotion engine, communication means, detection means, and notification means, which work together to perform their functions.

[0297] The device constantly monitors the user's speech and text input, collecting this data using data acquisition methods. In the case of speech data, this process includes converting it into text data using speech recognition technology. The emotion engine determines the user's emotional state based on the collected data. It analyzes tone from speech and context and style from text to infer emotions such as anger or sadness.

[0298] The collected and analyzed data is transmitted to a server via communication means. After receiving the data, the server uses natural language processing technology to consider both the content of the utterance and the emotional state to detect potential misconduct. Through this dual analysis, the detection means can detect even subtle nuances of misconduct that could not be captured by conventional text analysis alone.

[0299] If the detection results indicate that a user's statements constitute harassment or other inappropriate behavior, the server will send a warning message to the device via a notification system. Based on the user's emotional state, as assessed by the emotion engine, the notification content will be customized to be easily received by the user. For example, if the user is angry, the notification will be expressed in a calmer, more composed tone.

[0300] As a concrete example, consider a scenario in an online team meeting where a member says, "Your plan is completely unrealistic." In this case, the device transcribes the statement into text, and if the emotion engine detects an angry tone, that information is sent to the server. The server detects the statement as inappropriate but also generates an improvement message that takes the speaker's emotions into consideration. The device then presents the user with an improvement suggestion in a calm tone, such as "Let's consider another proposal," to help prevent the atmosphere from worsening.

[0301] In this way, the present invention provides a system that comprehensively analyzes the user's language and emotional information and supports the promotion of effective communication.

[0302] The following describes the processing flow.

[0303] Step 1:

[0304] The terminal monitors the user's voice and text input in real time. This data is obtained through data acquisition means, and the voice input is converted into text data by the voice recognition function.

[0305] Step 2:

[0306] The terminal sends the acquired data to the emotion engine to analyze the user's emotional state. Here, the emotion is determined from the tone of the voice and the context of the text. Through this process, the user's emotional state is evaluated with numerical values or labels.

[0307] Step 3:

[0308] The terminal packages the analyzed emotion data and text data and sends this to the server using secure communication means. This communication is carried out in real time, and encryption technology is used to maintain the confidentiality of the transmitted data.

[0309] Step 4:

[0310] The server receives the received data for analysis. Utilizing natural language processing technology, it extracts patterns of inappropriate behavior from the text data and simultaneously refers to the emotion data to judge the nuance of the utterance.

[0311] Step 5:

[0312] If the server detects mobbing or other inappropriate behavior, it generates a warning message for the terminal through the notification means. This message is adjusted based on the user's emotional state and the content is appropriately customized.

[0313] Step 6:

[0314] The server sends the generated warning message to the terminal. The communication here is also carried out with emphasis on security and low latency.

[0315] Step 7:

[0316] The terminal receives warning messages from the server and notifies the user through the user interface. Notifications are delivered via pop-ups and alert sounds, and are presented to the user in an emotionally sensitive manner.

[0317] Step 8:

[0318] Users receive notifications on their devices, giving them an opportunity to review their own statements and actions. If necessary, they can follow the suggested improvements and modify their communication to make it better.

[0319] (Example 2)

[0320] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0321] Traditional communication analysis systems detect inappropriate behavior based solely on the content of user statements, failing to capture subtle inappropriate actions that include emotional nuances. Furthermore, warnings lacked flexibility because they did not consider the user's emotional state.

[0322] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0323] In this invention, the server includes means for monitoring and acquiring information, means for transforming the acquired information, and means for analyzing the transformed information and determining its state. This makes it possible to comprehensively analyze the user's statements and emotional state, generate appropriate warnings, and send notifications.

[0324] A "device for monitoring and acquiring information" is a device that continuously detects user speech and text input and collects that data.

[0325] A "device for converting acquired information" is a device for appropriately converting audio data into text data or other formats.

[0326] A "device that analyzes converted information and determines the state" is a device that analyzes the user's emotions and intentions based on the converted data and identifies the corresponding emotional state.

[0327] A "device that detects behavior based on received data" is a device that uses analyzed data to identify inappropriate behavior or potential problematic behaviors of users.

[0328] A "device that notifies users" is a device that provides warnings and advice to users based on detected problems.

[0329] An "information display device" is a device that presents information to a user visually or audibly.

[0330] This invention provides a system that enables sophisticated communication analysis by monitoring user statements and inputs and evaluating their content and emotional state. The following describes how this invention can be implemented.

[0331] The device constantly monitors the user's speech and text input. When the user inputs by voice, the device first acquires the voice data and converts it into text data using "speech recognition technology." This conversion uses commonly available speech recognition software.

[0332] All input data is then analyzed by an emotion engine. The emotion engine uses specific algorithms to determine the user's emotions from the input data. It analyzes tone from voice data and context and style from text data to infer emotions such as anger or sadness. Technologies used for analysis include "emotion analysis software."

[0333] The data analyzed by the device is sent to the server via communication means. The server receives this data and uses a "generative AI model" and other tools to comprehensively evaluate the user's statements and emotional state, and detects the possibility of inappropriate behavior. If inappropriate behavior is detected, an appropriate warning is sent to the device via a notification means.

[0334] As an example, consider an online meeting scenario. If a member says, "Your plan isn't realistic," and this statement is transcribed by the terminal, an emotion engine detects anger, this information is sent to the server. If the server detects it as an inappropriate statement, it generates a message saying, "We'll consider other suggestions," and delivers it to the user in a calm tone. In this way, the atmosphere of the meeting can be calmed.

[0335] This system can use the prompt "When the user is expressing strong emotions during a conversation, please generate and provide appropriate feedback based on those emotions and context" as a prompt for the generative AI model. This allows for more appropriate communication while taking into account the user's emotional responses.

[0336] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0337] Step 1:

[0338] The terminal acquires user speech and text input. If the user inputs by voice, this is collected as voice data. The input is voice waveform data, and the output is voice data as is. The voice data is temporarily held in memory in preparation for subsequent processing.

[0339] Step 2:

[0340] The device converts acquired audio data into text data using speech recognition technology. The input is audio data, and the output is text data in string format. This conversion uses an algorithm that analyzes audio patterns and converts them into corresponding linguistic expressions.

[0341] Step 3:

[0342] The device sends the converted text data to the emotion engine to determine the user's emotional state. The input is text data, and the output is metadata indicating the emotional state. Here, the context and word choices of the text are analyzed, and the intensity and type of emotion are quantified using an emotion analysis algorithm.

[0343] Step 4:

[0344] The terminal transmits the sentiment analysis results and text data to the server using a communication method. The input is text data and its sentiment state metadata, and the output is a data package transferred to the server. The data is encrypted using a communication protocol and securely transmitted to the server.

[0345] Step 5:

[0346] The server uses a generative AI model with the received data to detect misconduct. The input consists of text data and emotional state metadata, and the output is the result of the misconduct determination. The generative AI model uses natural language processing to comprehensively evaluate the input content and emotions.

[0347] Step 6:

[0348] The server generates a notification for the user based on the detection results. The input is the result of the misconduct determination, and the output is the message presented to the user. A generative AI model generates the suggested content and constructs the message in an appropriate tone.

[0349] Step 7:

[0350] The terminal displays notifications received from the server to the user. The input is a message from the server, and the output is a notification presented to the user visually or audibly. Notifications are delivered using a display or speaker in a way that the user can intuitively understand.

[0351] (Application Example 2)

[0352] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0353] In modern households, there is a need to mitigate discord caused by emotional misunderstandings and inappropriate behavior in interpersonal communication, and to promote smooth dialogue. However, there is insufficient technology to sense subtle emotional changes in communication in a timely manner and provide appropriate support. This invention aims to provide a system that can prevent potential problems, particularly in communication within the family.

[0354] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0355] In this invention, the server includes information acquisition means for monitoring user speech or input, communication means for transmitting the acquired information to a remote processing unit, and detection means for detecting inappropriate behavior from the received information in the processing unit. This makes it possible to analyze the content of conversations and improve emotional states as an automated device in the home.

[0356] "Information acquisition means" refers to a device or system that has the function of monitoring user speech and input and collecting that data.

[0357] "Communication means" refers to a device or system that has the function of transmitting collected information to a processing device located at a remote location.

[0358] A "processing device" is a device or system that has the function of performing analysis to detect inappropriate behavior based on received information.

[0359] A "detection method" is a system that uses natural language processing technology to analyze information and has the function of detecting emotional states and inappropriate behavior.

[0360] A "notification means" is a device or system that has the function of notifying the user of warnings or suggestions based on the detected results.

[0361] A "home-use automated device" is a device that analyzes everyday communication within the home and provides support to facilitate smooth dialogue.

[0362] The system designed to realize this application is an automated device for facilitating communication within the home. In this system, information acquisition means, communication means, processing means, detection means, and notification means work together. Specifically, it operates through the following process:

[0363] The device (a home robot) constantly monitors the user's speech and text input through information acquisition methods. In the case of voice data, a microphone captures the voice, and a speech analysis API (such as Google Speech-to-Text) is used to convert the voice to text. The text data is then sent to a server via the internet using communication methods.

[0364] The server analyzes the received text data using a natural language processing library (e.g., NLTK) to assess the emotional state. This assessment includes analysis based on changes in voice tone and the context of the text. If inappropriate behavior is detected, the server notifies the terminal of this information.

[0365] Based on this information, the device sends appropriate warnings and suggestions to the user. For example, if the user shows signs of stress, the robot will offer a gentle suggestion such as, "Why don't you take a short break?" This helps to facilitate smoother communication within the household.

[0366] For example, if children start arguing during a game, the robot can instantly analyze the situation and suggest, "Why don't you take a break?" to alleviate the conflict. An example of a prompt for the generative AI model in such a scenario would be, "How can a household robot come up with a suggestion to calm the conversation during the game and communicate it?"

[0367] This system makes it possible to mitigate communication friction that often occurs within the family in real time, and to maintain good relationships.

[0368] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0369] Step 1:

[0370] The device captures the user's speech through the microphone. The input here is audio data. The device converts this audio data into text data using a speech analysis API. The converted text data is the output.

[0371] Step 2:

[0372] The terminal sends the converted text data to the server using a communication method. The output of this processing step is the text data sent to the server. The server takes the received text data as input for processing.

[0373] Step 3:

[0374] The server analyzes the received text data using a natural language processing library to evaluate the emotional state. For example, it identifies keywords and tones that indicate emotion within the text and determines the type of emotion. The input is the text data received by the server, and the output is the analyzed emotion data.

[0375] Step 4:

[0376] Based on the analysis results, the server generates appropriate warnings or suggestions if problems are detected. A generative AI model is applied, and its output is a message encouraging improvement. This is generated using prompts to the generative AI model. The input for this step is sentiment data, and the output is the generated suggestion message.

[0377] Step 5:

[0378] The server sends the generated suggestion message to the terminal. This sent message becomes input, and the terminal performs an action to notify the user via display or audio as output. The terminal provides the suggestion through its user interface, for example, using a display or speaker.

[0379] This processing flow makes it possible to support communication within the family and help maintain good relationships.

[0380] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0381] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0382] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0383] [Third Embodiment]

[0384] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0385] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0386] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0387] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0388] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0389] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0390] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0391] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0392] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0393] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0394] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0395] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0396] This invention relates to a system that monitors user communication content in real time and automatically detects inappropriate remarks or actions. This system functions through collaboration between a terminal used by the user and a server that performs data analysis.

[0397] The device constantly monitors the user's speech and input text in the background and collects this information using data acquisition methods. The collected data is securely transmitted to a server using communication methods. In the case of voice input, the device utilizes speech recognition technology to convert it into text data in real time.

[0398] The server analyzes the received text data. Using natural language processing techniques, it tokenizes and parses the data to detect patterns of harassment and inappropriate remarks. To perform this analysis, the server is equipped with a pre-trained AI model that assesses the risk by comparing it to past cases.

[0399] If a risk is detected, the server immediately sends the result to the terminal via a notification system. The terminal receives this and issues a warning to the user through the user interface. The warning may include specific examples of inappropriate remarks and suggestions for improvement. Based on this information, the user can review their own words and actions and communicate appropriately.

[0400] A concrete example is an online meeting at work. If a participant says, "That idea is completely worthless," during the meeting, the device transcribes the speech in real time and sends the information to the server. The server detects this expression as inappropriate and sends a warning message back to the device. The user then receives a notification on their device suggesting an improvement, such as, "We'll take that opinion into consideration, but let's think about an alternative approach," allowing them to reconsider their statement.

[0401] This allows the system to support users in achieving socially desirable communication.

[0402] The following describes the processing flow.

[0403] Step 1:

[0404] The device constantly monitors the user's voice or text input. In the case of voice input, it uses speech recognition technology to convert it into text data in real time. It also preprocesses the text data to remove noise and unnecessary information.

[0405] Step 2:

[0406] The terminal prepares the acquired text data and sends it to the server via a communication protocol. Encryption technology is used during this process to ensure the security and low latency of the data transfer.

[0407] Step 3:

[0408] The server receives text data from the terminal and prepares it for analysis. The data is then passed to the natural language processing engine, where tokenization and syntactic analysis are performed.

[0409] Step 4:

[0410] The AI model on the server detects patterns related to moral harassment and inappropriate remarks based on the analysis results. The detection is performed by comparing the results with a pre-trained dataset of past data.

[0411] Step 5:

[0412] If the server detects a potential instance of harassment, it generates a warning message. This message includes specific examples of inappropriate remarks and suggestions for improvement.

[0413] Step 6:

[0414] The server sends the generated warning message to the terminal. Immediacy is paramount, and communication is carried out as quickly as possible.

[0415] Step 7:

[0416] The terminal receives warning messages from the server and notifies the user through the user interface. The notification is accompanied by a pop-up or alert sound to make it easy for the user to notice.

[0417] Step 8:

[0418] Users can check notifications from their devices and have the opportunity to review their statements and actions. If necessary, they can modify their communication according to the improvement suggestions provided by the system.

[0419] (Example 1)

[0420] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0421] Traditionally, there has been no effective system for detecting inappropriate content in user statements or entered text information in real time and immediately issuing warnings to users. Therefore, even when a user's communication is socially undesirable, improvement may be delayed, potentially leading to the problem escalating. This invention aims to solve such problems and support users in immediately striving for appropriate communication.

[0422] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0423] In this invention, the server includes data collection means, data transmission means, detection means, and information presentation means. This makes it possible to identify fraudulent activity in real time from the user's voice or text information and immediately issue a warning.

[0424] "Data collection means" refers to devices or programs that have the function of acquiring user voice and text information and generating information for analysis within the system.

[0425] "Data transmission means" refers to communication functions and technologies for safely and efficiently transmitting collected information to remotely located information processing devices.

[0426] A "detection means" is a device or program that has the function of analyzing data using language analysis technology in order to identify inappropriate speech or misconduct from received information.

[0427] An "information processing device" is a general term for hardware and software used to analyze information sent from data collection devices and to actually make decisions and perform analyses.

[0428] "Information presentation means" refers to devices or programs that provide users with analysis results or warning information visually or audibly through a user interface.

[0429] This invention is a system that monitors user communication in real time and automatically detects inappropriate remarks and actions. Its functionality is primarily achieved through collaboration between a terminal and a server.

[0430] The device constantly monitors the user's speech and text input. Specifically, it uses the microphone built into the device and speech recognition software to instantly convert the user's voice into text data. A commercially available speech recognition API is used for smooth conversion. This text data is then sent to the server using a data transmission method. Security is ensured by using the secure HTTP communication protocol and appropriately encrypting the data during transmission.

[0431] When the server receives text data sent from a terminal, it analyzes its content using natural language processing techniques. This analysis utilizes generative AI models built with frameworks such as TensorFlow and PyTorch. These AI models evaluate the data in detail by performing tokenization and syntactic analysis, and detect risks by comparing it with past inappropriate speech patterns.

[0432] When the server detects a risk, it immediately sends a warning message to the device. This message is presented visually to the user through the device's user interface. The notification may include examples of detected inappropriate remarks and suggestions on how to improve them. This allows users to review their communication style and take socially appropriate actions.

[0433] As a concrete example, consider a scenario where a participant in an online meeting system says, "That opinion is completely useless." The device transcribes this statement in real time and immediately sends the information to the server. The server detects this expression as inappropriate and sends a warning message back to the device along with suggestions for improvement. The user receives a suggestion to "consider that opinion as well, and then look at it from a different perspective," allowing them to reconsider their statement.

[0434] An example of such a prompt message is, "Inappropriate remarks were detected during the online meeting. Please suggest ways to improve." This allows users to strive for better communication.

[0435] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0436] Step 1:

[0437] The device monitors the user's voice and text input. When the user begins speaking, the device collects voice data through its built-in microphone. This data is input into speech recognition software and converted into text data in real time. The output text data is then formatted into a format that allows for further analysis within the system.

[0438] Step 2:

[0439] The terminal sends the obtained text data to the server. The terminal transmits the text data to the server via a secure protocol such as HTTPS. The input text data is encrypted and securely transmitted through relay processes before reaching the server.

[0440] Step 3:

[0441] The server analyzes the text data received from the terminal. Using the received text data as input, it performs tokenization and syntactic analysis using natural language processing techniques. This analysis process breaks down the transmitted text into individual words and determines their meaning. Based on this, the server utilizes a generative AI model to assess the risk of inappropriate remarks. The output is the analysis result.

[0442] Step 4:

[0443] The server generates a warning if inappropriate remarks are detected as a result of the risk assessment. Specifically, a generation AI model evaluates the risks in the remarks, and if they exceed a threshold, it creates a warning message that includes suggestions for improvement. This message is prepared in the form of a prompt and sent to the terminal.

[0444] Step 5:

[0445] The terminal receives warning messages from the server and notifies the user through the user interface. It decodes the received messages and presents them visually to the user. The terminal presents the user with specific examples of inappropriate remarks and suggestions for improvement, allowing the user to modify their behavior accordingly.

[0446] (Application Example 1)

[0447] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0448] In today's communication environment, inappropriate remarks and negative expressions can negatively impact interpersonal relationships, which is a significant problem. This invention aims to provide a system that supports healthy communication by detecting such negative language expressions in real time and suggesting appropriate improvements.

[0449] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0450] In this invention, the server includes data acquisition means for acquiring user language data, communication means for transmitting the acquired language data to an external data processing device, detection means for analyzing the received language data and detecting negative content, and notification means for providing the user with warnings, including suggestions for improvement, based on the detected results. This makes it possible to monitor user statements and input content in real time and correct negative expressions immediately.

[0451] "User language data" refers to the utterances and input content obtained from the user as speech or text.

[0452] "Data acquisition means" refers to a function that collects user language data and prepares it for processing.

[0453] "Communication means" refers to the technology and methods used to transmit acquired data to a data processing device located at a remote location.

[0454] "Detection means" refers to a function that analyzes received data and identifies negative or inappropriate content from it.

[0455] "Notification means" refers to technologies and methods used to communicate warnings and suggestions for improvement to users based on detected results.

[0456] "External data processing device" refers to a computer system or server used to receive and analyze data.

[0457] "Improvement suggestions" refer to specific advice and guidance provided to users to promote more appropriate and constructive communication when negative or inappropriate language is detected.

[0458] To implement this invention, it is necessary to install an application on the user's terminal for speech recognition and natural language processing. The terminal constantly monitors the user's speech and text input in the background and acquires language data. This data is transmitted to an external data processing device (server) via communication means.

[0459] The server analyzes the received data using natural language processing techniques. Audio data is converted to text data using speech recognition software. Tokenization, syntactic analysis, and sentiment analysis are applied to the text data. Sentiment analysis uses a pre-trained generative AI model to detect negative or inappropriate content.

[0460] If an inappropriate statement is detected according to pre-set criteria, the server will notify the terminal of the result. This notification will include a warning about the statement and suggestions for improvement via the user interface. This allows the user to review the content of their communication in real time and correct it to be more appropriate.

[0461] For example, if the phrase "This idea is unusable" is detected during an online company meeting, the server sends a notification to the participant's device that includes a suggestion for improvement, such as "Let's think of a more flexible idea." As a result, meeting participants can reflect on their own statements and promote constructive discussion.

[0462] Examples of prompts to input into the generative AI model include, "Detect negative expressions in the text and provide suggestions for making them more positive," and "Analyze the audio data and generate warning messages about potentially harassing expressions."

[0463] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0464] Step 1:

[0465] The device monitors the user's voice and text input in real time. In the case of voice, it uses a microphone to acquire voice data and converts it into text data using speech recognition technology. The input is the user's speech or text, and the output is language data in text format. Specifically, the device captures voice using its microphone and converts it into text.

[0466] Step 2:

[0467] The terminal securely transmits the acquired text data to the server via a communication method. The input is text-based language data, and the output is the completion of data transmission to the server. Specifically, the data is transferred to the server via a network protocol.

[0468] Step 3:

[0469] The server analyzes the received text data. Using natural language processing techniques, it performs tokenization and syntactic analysis, and then uses a generative AI model to perform sentiment analysis. The input is text data, and the output is the analysis result, including the detection of sentiment and negative statements. Specific operations include referencing a database and comparing it with past cases.

[0470] Step 4:

[0471] If the server determines that the analysis results are negative or inappropriate, it generates specific improvement suggestions. It inputs prompt statements into an AI model to create improvement suggestions. The input is the detection results for negative statements, and the output is a suggestion statement for the user. In essence, the AI is provided with prompts to generate improvement suggestions.

[0472] Step 5:

[0473] The server sends a warning message containing the generated improvement suggestions to the terminal via a communication method. The input is the suggestion text, and the output is the completion of notification to the terminal. The specific action here is the process of sending data over the network.

[0474] Step 6:

[0475] The terminal displays received suggestions to the user via a user interface. The input is notification data from the server, and the output is a visible warning message to the user. Specifically, the notification pops up on the screen, allowing the user to immediately check it.

[0476] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0477] This invention relates to a system that achieves more precise communication analysis by evaluating not only the user's utterances and input content, but also their emotional state. This system consists of data acquisition means, an emotion engine, communication means, detection means, and notification means, which work together to perform their functions.

[0478] The device constantly monitors the user's speech and text input, collecting this data using data acquisition methods. In the case of speech data, this process includes converting it into text data using speech recognition technology. The emotion engine determines the user's emotional state based on the collected data. It analyzes tone from speech and context and style from text to infer emotions such as anger or sadness.

[0479] The collected and analyzed data is transmitted to a server via communication means. After receiving the data, the server uses natural language processing technology to consider both the content of the utterance and the emotional state to detect potential misconduct. Through this dual analysis, the detection means can detect even subtle nuances of misconduct that could not be captured by conventional text analysis alone.

[0480] If the detection results indicate that a user's statements constitute harassment or other inappropriate behavior, the server will send a warning message to the device via a notification system. Based on the user's emotional state, as assessed by the emotion engine, the notification content will be customized to be easily received by the user. For example, if the user is angry, the notification will be expressed in a calmer, more composed tone.

[0481] As a concrete example, consider a scenario in an online team meeting where a member says, "Your plan is completely unrealistic." In this case, the device transcribes the statement into text, and if the emotion engine detects an angry tone, that information is sent to the server. The server detects the statement as inappropriate but also generates an improvement message that takes the speaker's emotions into consideration. The device then presents the user with an improvement suggestion in a calm tone, such as "Let's consider another proposal," to help prevent the atmosphere from worsening.

[0482] In this way, the present invention provides a system that comprehensively analyzes the user's language and emotional information and supports the promotion of effective communication.

[0483] The following describes the processing flow.

[0484] Step 1:

[0485] The terminal monitors the user's voice and text input in real time. This data is acquired through data acquisition means, and voice input is converted into text data by a speech recognition function.

[0486] Step 2:

[0487] The device sends the acquired data to an emotion engine, which analyzes the user's emotional state. This engine determines emotions based on factors such as voice tone and text context. This process then evaluates the user's emotional state using numerical values and labels.

[0488] Step 3:

[0489] The terminal packages the analyzed sentiment data and text data and sends it to the server using a secure communication method. This communication takes place in real time, and encryption technology is used to maintain the confidentiality of the transmitted data.

[0490] Step 4:

[0491] The server receives the data for analysis. Using natural language processing techniques, it extracts patterns of misconduct from the text data and simultaneously refers to sentiment data to determine the nuances of the speech.

[0492] Step 5:

[0493] If the server detects harassment or other inappropriate behavior, it will generate a warning message on the device via a notification system. This message will be tailored and customized appropriately based on the user's emotional state.

[0494] Step 6:

[0495] The server sends the generated warning message to the terminal. This communication is also conducted with an emphasis on security and low latency.

[0496] Step 7:

[0497] The terminal receives warning messages from the server and notifies the user through the user interface. Notifications are delivered via pop-ups and alert sounds, and are presented to the user in an emotionally sensitive manner.

[0498] Step 8:

[0499] Users receive notifications on their devices, giving them an opportunity to review their own statements and actions. If necessary, they can follow the suggested improvements and modify their communication to make it better.

[0500] (Example 2)

[0501] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0502] Traditional communication analysis systems detect inappropriate behavior based solely on the content of user statements, failing to capture subtle inappropriate actions that include emotional nuances. Furthermore, warnings lacked flexibility because they did not consider the user's emotional state.

[0503] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0504] In this invention, the server includes means for monitoring and acquiring information, means for transforming the acquired information, and means for analyzing the transformed information and determining its state. This makes it possible to comprehensively analyze the user's statements and emotional state, generate appropriate warnings, and send notifications.

[0505] A "device for monitoring and acquiring information" is a device that continuously detects user speech and text input and collects that data.

[0506] A "device for converting acquired information" is a device for appropriately converting audio data into text data or other formats.

[0507] A "device that analyzes converted information and determines the state" is a device that analyzes the user's emotions and intentions based on the converted data and identifies the corresponding emotional state.

[0508] A "device that detects behavior based on received data" is a device that uses analyzed data to identify inappropriate behavior or potential problematic behaviors of users.

[0509] A "device that notifies users" is a device that provides warnings and advice to users based on detected problems.

[0510] An "information display device" is a device that presents information to a user visually or audibly.

[0511] This invention provides a system that enables sophisticated communication analysis by monitoring user statements and inputs and evaluating their content and emotional state. The following describes how this invention can be implemented.

[0512] The device constantly monitors the user's speech and text input. When the user inputs by voice, the device first acquires the voice data and converts it into text data using "speech recognition technology." This conversion uses commonly available speech recognition software.

[0513] All input data is then analyzed by an emotion engine. The emotion engine uses specific algorithms to determine the user's emotions from the input data. It analyzes tone from voice data and context and style from text data to infer emotions such as anger or sadness. Technologies used for analysis include "emotion analysis software."

[0514] The data analyzed by the device is sent to the server via communication means. The server receives this data and uses a "generative AI model" and other tools to comprehensively evaluate the user's statements and emotional state, and detects the possibility of inappropriate behavior. If inappropriate behavior is detected, an appropriate warning is sent to the device via a notification means.

[0515] As an example, consider an online meeting scenario. If a member says, "Your plan isn't realistic," and this statement is transcribed by the terminal, an emotion engine detects anger, this information is sent to the server. If the server detects it as an inappropriate statement, it generates a message saying, "We'll consider other suggestions," and delivers it to the user in a calm tone. In this way, the atmosphere of the meeting can be calmed.

[0516] This system can use the prompt "When the user is expressing strong emotions during a conversation, please generate and provide appropriate feedback based on those emotions and context" as a prompt for the generative AI model. This allows for more appropriate communication while taking into account the user's emotional responses.

[0517] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0518] Step 1:

[0519] The terminal acquires user speech and text input. If the user inputs by voice, this is collected as voice data. The input is voice waveform data, and the output is voice data as is. The voice data is temporarily held in memory in preparation for subsequent processing.

[0520] Step 2:

[0521] The device converts acquired audio data into text data using speech recognition technology. The input is audio data, and the output is text data in string format. This conversion uses an algorithm that analyzes audio patterns and converts them into corresponding linguistic expressions.

[0522] Step 3:

[0523] The device sends the converted text data to the emotion engine to determine the user's emotional state. The input is text data, and the output is metadata indicating the emotional state. Here, the context and word choices of the text are analyzed, and the intensity and type of emotion are quantified using an emotion analysis algorithm.

[0524] Step 4:

[0525] The terminal transmits the sentiment analysis results and text data to the server using a communication method. The input is text data and its sentiment state metadata, and the output is a data package transferred to the server. The data is encrypted using a communication protocol and securely transmitted to the server.

[0526] Step 5:

[0527] The server uses a generative AI model with the received data to detect misconduct. The input consists of text data and emotional state metadata, and the output is the result of the misconduct determination. The generative AI model uses natural language processing to comprehensively evaluate the input content and emotions.

[0528] Step 6:

[0529] The server generates a notification for the user based on the detection results. The input is the result of the misconduct determination, and the output is the message presented to the user. A generative AI model generates the suggested content and constructs the message in an appropriate tone.

[0530] Step 7:

[0531] The terminal displays notifications received from the server to the user. The input is a message from the server, and the output is a notification presented to the user visually or audibly. Notifications are delivered using a display or speaker in a way that the user can intuitively understand.

[0532] (Application Example 2)

[0533] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0534] In modern households, there is a need to mitigate discord caused by emotional misunderstandings and inappropriate behavior in interpersonal communication, and to promote smooth dialogue. However, there is insufficient technology to sense subtle emotional changes in communication in a timely manner and provide appropriate support. This invention aims to provide a system that can prevent potential problems, particularly in communication within the family.

[0535] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0536] In this invention, the server includes information acquisition means for monitoring user speech or input, communication means for transmitting the acquired information to a remote processing unit, and detection means for detecting inappropriate behavior from the received information in the processing unit. This makes it possible to analyze the content of conversations and improve emotional states as an automated device in the home.

[0537] "Information acquisition means" refers to a device or system that has the function of monitoring user speech and input and collecting that data.

[0538] "Communication means" refers to a device or system that has the function of transmitting collected information to a processing device located at a remote location.

[0539] A "processing device" is a device or system that has the function of performing analysis to detect inappropriate behavior based on received information.

[0540] A "detection method" is a system that uses natural language processing technology to analyze information and has the function of detecting emotional states and inappropriate behavior.

[0541] A "notification means" is a device or system that has the function of notifying the user of warnings or suggestions based on the detected results.

[0542] A "home-use automated device" is a device that analyzes everyday communication within the home and provides support to facilitate smooth dialogue.

[0543] The system designed to realize this application is an automated device for facilitating communication within the home. In this system, information acquisition means, communication means, processing means, detection means, and notification means work together. Specifically, it operates through the following process:

[0544] The device (a home robot) constantly monitors the user's speech and text input through information acquisition methods. In the case of voice data, a microphone captures the voice, and a speech analysis API (such as Google Speech-to-Text) is used to convert the voice to text. The text data is then sent to a server via the internet using communication methods.

[0545] The server analyzes the received text data using a natural language processing library (e.g., NLTK) to assess the emotional state. This assessment includes analysis based on changes in voice tone and the context of the text. If inappropriate behavior is detected, the server notifies the terminal of this information.

[0546] Based on this information, the device sends appropriate warnings and suggestions to the user. For example, if the user shows signs of stress, the robot will offer a gentle suggestion such as, "Why don't you take a short break?" This helps to facilitate smoother communication within the household.

[0547] For example, if children start arguing during a game, the robot can instantly analyze the situation and suggest, "Why don't you take a break?" to alleviate the conflict. An example of a prompt for the generative AI model in such a scenario would be, "How can a household robot come up with a suggestion to calm the conversation during the game and communicate it?"

[0548] This system makes it possible to mitigate communication friction that often occurs within the family in real time, and to maintain good relationships.

[0549] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0550] Step 1:

[0551] The device captures the user's speech through the microphone. The input here is audio data. The device converts this audio data into text data using a speech analysis API. The converted text data is the output.

[0552] Step 2:

[0553] The terminal sends the converted text data to the server using a communication method. The output of this processing step is the text data sent to the server. The server takes the received text data as input for processing.

[0554] Step 3:

[0555] The server analyzes the received text data using a natural language processing library to evaluate the emotional state. For example, it identifies keywords and tones that indicate emotion within the text and determines the type of emotion. The input is the text data received by the server, and the output is the analyzed emotion data.

[0556] Step 4:

[0557] Based on the analysis results, the server generates appropriate warnings or suggestions if problems are detected. A generative AI model is applied, and its output is a message encouraging improvement. This is generated using prompts to the generative AI model. The input for this step is sentiment data, and the output is the generated suggestion message.

[0558] Step 5:

[0559] The server sends the generated suggestion message to the terminal. This sent message becomes input, and the terminal performs an action to notify the user via display or audio as output. The terminal provides the suggestion through its user interface, for example, using a display or speaker.

[0560] This processing flow makes it possible to support communication within the family and help maintain good relationships.

[0561] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0562] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0563] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0564] [Fourth Embodiment]

[0565] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0566] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0567] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0568] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0569] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0570] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0571] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0572] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0573] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0574] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0575] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0576] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0577] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0578] This invention relates to a system that monitors user communication content in real time and automatically detects inappropriate remarks or actions. This system functions through collaboration between a terminal used by the user and a server that performs data analysis.

[0579] The device constantly monitors the user's speech and input text in the background and collects this information using data acquisition methods. The collected data is securely transmitted to a server using communication methods. In the case of voice input, the device utilizes speech recognition technology to convert it into text data in real time.

[0580] The server analyzes the received text data. Using natural language processing techniques, it tokenizes and parses the data to detect patterns of harassment and inappropriate remarks. To perform this analysis, the server is equipped with a pre-trained AI model that assesses the risk by comparing it to past cases.

[0581] If a risk is detected, the server immediately sends the result to the terminal via a notification system. The terminal receives this and issues a warning to the user through the user interface. The warning may include specific examples of inappropriate remarks and suggestions for improvement. Based on this information, the user can review their own words and actions and communicate appropriately.

[0582] A concrete example is an online meeting at work. If a participant says, "That idea is completely worthless," during the meeting, the device transcribes the speech in real time and sends the information to the server. The server detects this expression as inappropriate and sends a warning message back to the device. The user then receives a notification on their device suggesting an improvement, such as, "We'll take that opinion into consideration, but let's think about an alternative approach," allowing them to reconsider their statement.

[0583] This allows the system to support users in achieving socially desirable communication.

[0584] The following describes the processing flow.

[0585] Step 1:

[0586] The device constantly monitors the user's voice or text input. In the case of voice input, it uses speech recognition technology to convert it into text data in real time. It also preprocesses the text data to remove noise and unnecessary information.

[0587] Step 2:

[0588] The terminal prepares the acquired text data and sends it to the server via a communication protocol. Encryption technology is used during this process to ensure the security and low latency of the data transfer.

[0589] Step 3:

[0590] The server receives text data from the terminal and prepares it for analysis. The data is then passed to the natural language processing engine, where tokenization and syntactic analysis are performed.

[0591] Step 4:

[0592] The AI model on the server detects patterns related to moral harassment and inappropriate remarks based on the analysis results. The detection is performed by comparing the results with a pre-trained dataset of past data.

[0593] Step 5:

[0594] If the server detects a potential instance of harassment, it generates a warning message. This message includes specific examples of inappropriate remarks and suggestions for improvement.

[0595] Step 6:

[0596] The server sends the generated warning message to the terminal. Immediacy is paramount, and communication is carried out as quickly as possible.

[0597] Step 7:

[0598] The terminal receives warning messages from the server and notifies the user through the user interface. The notification is accompanied by a pop-up or alert sound to make it easy for the user to notice.

[0599] Step 8:

[0600] Users can check notifications from their devices and have the opportunity to review their statements and actions. If necessary, they can modify their communication according to the improvement suggestions provided by the system.

[0601] (Example 1)

[0602] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0603] Traditionally, there has been no effective system for detecting inappropriate content in user statements or entered text information in real time and immediately issuing warnings to users. Therefore, even when a user's communication is socially undesirable, improvement may be delayed, potentially leading to the problem escalating. This invention aims to solve such problems and support users in immediately striving for appropriate communication.

[0604] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0605] In this invention, the server includes data collection means, data transmission means, detection means, and information presentation means. This makes it possible to identify fraudulent activity in real time from the user's voice or text information and immediately issue a warning.

[0606] "Data collection means" refers to devices or programs that have the function of acquiring user voice and text information and generating information for analysis within the system.

[0607] "Data transmission means" refers to communication functions and technologies for safely and efficiently transmitting collected information to remotely located information processing devices.

[0608] A "detection means" is a device or program that has the function of analyzing data using language analysis technology in order to identify inappropriate speech or misconduct from received information.

[0609] An "information processing device" is a general term for hardware and software used to analyze information sent from data collection devices and to actually make decisions and perform analyses.

[0610] "Information presentation means" refers to devices or programs that provide users with analysis results or warning information visually or audibly through a user interface.

[0611] This invention is a system that monitors user communication in real time and automatically detects inappropriate remarks and actions. Its functionality is primarily achieved through collaboration between a terminal and a server.

[0612] The device constantly monitors the user's speech and text input. Specifically, it uses the microphone built into the device and speech recognition software to instantly convert the user's voice into text data. A commercially available speech recognition API is used for smooth conversion. This text data is then sent to the server using a data transmission method. Security is ensured by using the secure HTTP communication protocol and appropriately encrypting the data during transmission.

[0613] When the server receives text data sent from a terminal, it analyzes its content using natural language processing techniques. This analysis utilizes generative AI models built with frameworks such as TensorFlow and PyTorch. These AI models evaluate the data in detail by performing tokenization and syntactic analysis, and detect risks by comparing it with past inappropriate speech patterns.

[0614] When the server detects a risk, it immediately sends a warning message to the device. This message is presented visually to the user through the device's user interface. The notification may include examples of detected inappropriate remarks and suggestions on how to improve them. This allows users to review their communication style and take socially appropriate actions.

[0615] As a concrete example, consider a scenario where a participant in an online meeting system says, "That opinion is completely useless." The device transcribes this statement in real time and immediately sends the information to the server. The server detects this expression as inappropriate and sends a warning message back to the device along with suggestions for improvement. The user receives a suggestion to "consider that opinion as well, and then look at it from a different perspective," allowing them to reconsider their statement.

[0616] An example of such a prompt message is, "Inappropriate remarks were detected during the online meeting. Please suggest ways to improve." This allows users to strive for better communication.

[0617] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0618] Step 1:

[0619] The device monitors the user's voice and text input. When the user begins speaking, the device collects voice data through its built-in microphone. This data is input into speech recognition software and converted into text data in real time. The output text data is then formatted into a format that allows for further analysis within the system.

[0620] Step 2:

[0621] The terminal sends the obtained text data to the server. The terminal transmits the text data to the server via a secure protocol such as HTTPS. The input text data is encrypted and securely transmitted through relay processes before reaching the server.

[0622] Step 3:

[0623] The server analyzes the text data received from the terminal. Using the received text data as input, it performs tokenization and syntactic analysis using natural language processing techniques. This analysis process breaks down the transmitted text into individual words and determines their meaning. Based on this, the server utilizes a generative AI model to assess the risk of inappropriate remarks. The output is the analysis result.

[0624] Step 4:

[0625] The server generates a warning if inappropriate remarks are detected as a result of the risk assessment. Specifically, a generation AI model evaluates the risks in the remarks, and if they exceed a threshold, it creates a warning message that includes suggestions for improvement. This message is prepared in the form of a prompt and sent to the terminal.

[0626] Step 5:

[0627] The terminal receives warning messages from the server and notifies the user through the user interface. It decodes the received messages and presents them visually to the user. The terminal presents the user with specific examples of inappropriate remarks and suggestions for improvement, allowing the user to modify their behavior accordingly.

[0628] (Application Example 1)

[0629] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0630] In today's communication environment, inappropriate remarks and negative expressions can negatively impact interpersonal relationships, which is a significant problem. This invention aims to provide a system that supports healthy communication by detecting such negative language expressions in real time and suggesting appropriate improvements.

[0631] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0632] In this invention, the server includes data acquisition means for acquiring user language data, communication means for transmitting the acquired language data to an external data processing device, detection means for analyzing the received language data and detecting negative content, and notification means for providing the user with warnings, including suggestions for improvement, based on the detected results. This makes it possible to monitor user statements and input content in real time and correct negative expressions immediately.

[0633] "User language data" refers to the utterances and input content obtained from the user as speech or text.

[0634] "Data acquisition means" refers to a function that collects user language data and prepares it for processing.

[0635] "Communication means" refers to the technology and methods used to transmit acquired data to a data processing device located at a remote location.

[0636] "Detection means" refers to a function that analyzes received data and identifies negative or inappropriate content from it.

[0637] "Notification means" refers to technologies and methods used to communicate warnings and suggestions for improvement to users based on detected results.

[0638] "External data processing device" refers to a computer system or server used to receive and analyze data.

[0639] "Improvement suggestions" refer to specific advice and guidance provided to users to promote more appropriate and constructive communication when negative or inappropriate language is detected.

[0640] To implement this invention, it is necessary to install an application on the user's terminal for speech recognition and natural language processing. The terminal constantly monitors the user's speech and text input in the background and acquires language data. This data is transmitted to an external data processing device (server) via communication means.

[0641] The server analyzes the received data using natural language processing techniques. Audio data is converted to text data using speech recognition software. Tokenization, syntactic analysis, and sentiment analysis are applied to the text data. Sentiment analysis uses a pre-trained generative AI model to detect negative or inappropriate content.

[0642] If an inappropriate statement is detected according to pre-set criteria, the server will notify the terminal of the result. This notification will include a warning about the statement and suggestions for improvement via the user interface. This allows the user to review the content of their communication in real time and correct it to be more appropriate.

[0643] For example, if the phrase "This idea is unusable" is detected during an online company meeting, the server sends a notification to the participant's device that includes a suggestion for improvement, such as "Let's think of a more flexible idea." As a result, meeting participants can reflect on their own statements and promote constructive discussion.

[0644] Examples of prompts to input into the generative AI model include, "Detect negative expressions in the text and provide suggestions for making them more positive," and "Analyze the audio data and generate warning messages about potentially harassing expressions."

[0645] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0646] Step 1:

[0647] The device monitors the user's voice and text input in real time. In the case of voice, it uses a microphone to acquire voice data and converts it into text data using speech recognition technology. The input is the user's speech or text, and the output is language data in text format. Specifically, the device captures voice using its microphone and converts it into text.

[0648] Step 2:

[0649] The terminal securely transmits the acquired text data to the server via a communication method. The input is text-based language data, and the output is the completion of data transmission to the server. Specifically, the data is transferred to the server via a network protocol.

[0650] Step 3:

[0651] The server analyzes the received text data. Using natural language processing techniques, it performs tokenization and syntactic analysis, and then uses a generative AI model to perform sentiment analysis. The input is text data, and the output is the analysis result, including the detection of sentiment and negative statements. Specific operations include referencing a database and comparing it with past cases.

[0652] Step 4:

[0653] If the server determines that the analysis results are negative or inappropriate, it generates specific improvement suggestions. It inputs prompt statements into an AI model to create improvement suggestions. The input is the detection results for negative statements, and the output is a suggestion statement for the user. In essence, the AI is provided with prompts to generate improvement suggestions.

[0654] Step 5:

[0655] The server sends a warning message containing the generated improvement suggestions to the terminal via a communication method. The input is the suggestion text, and the output is the completion of notification to the terminal. The specific action here is the process of sending data over the network.

[0656] Step 6:

[0657] The terminal displays received suggestions to the user via a user interface. The input is notification data from the server, and the output is a visible warning message to the user. Specifically, the notification pops up on the screen, allowing the user to immediately check it.

[0658] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0659] This invention relates to a system that achieves more precise communication analysis by evaluating not only the user's utterances and input content, but also their emotional state. This system consists of data acquisition means, an emotion engine, communication means, detection means, and notification means, which work together to perform their functions.

[0660] The device constantly monitors the user's speech and text input, collecting this data using data acquisition methods. In the case of speech data, this process includes converting it into text data using speech recognition technology. The emotion engine determines the user's emotional state based on the collected data. It analyzes tone from speech and context and style from text to infer emotions such as anger or sadness.

[0661] The collected and analyzed data is transmitted to a server via communication means. After receiving the data, the server uses natural language processing technology to consider both the content of the utterance and the emotional state to detect potential misconduct. Through this dual analysis, the detection means can detect even subtle nuances of misconduct that could not be captured by conventional text analysis alone.

[0662] If the detection results indicate that a user's statements constitute harassment or other inappropriate behavior, the server will send a warning message to the device via a notification system. Based on the user's emotional state, as assessed by the emotion engine, the notification content will be customized to be easily received by the user. For example, if the user is angry, the notification will be expressed in a calmer, more composed tone.

[0663] As a concrete example, consider a scenario in an online team meeting where a member says, "Your plan is completely unrealistic." In this case, the device transcribes the statement into text, and if the emotion engine detects an angry tone, that information is sent to the server. The server detects the statement as inappropriate but also generates an improvement message that takes the speaker's emotions into consideration. The device then presents the user with an improvement suggestion in a calm tone, such as "Let's consider another proposal," to help prevent the atmosphere from worsening.

[0664] In this way, the present invention provides a system that comprehensively analyzes the user's language and emotional information and supports the promotion of effective communication.

[0665] The following describes the processing flow.

[0666] Step 1:

[0667] The terminal monitors the user's voice and text input in real time. This data is acquired through data acquisition means, and voice input is converted into text data by a speech recognition function.

[0668] Step 2:

[0669] The device sends the acquired data to an emotion engine, which analyzes the user's emotional state. This engine determines emotions based on factors such as voice tone and text context. This process then evaluates the user's emotional state using numerical values and labels.

[0670] Step 3:

[0671] The terminal packages the analyzed sentiment data and text data and sends it to the server using a secure communication method. This communication takes place in real time, and encryption technology is used to maintain the confidentiality of the transmitted data.

[0672] Step 4:

[0673] The server receives the data for analysis. Using natural language processing techniques, it extracts patterns of misconduct from the text data and simultaneously refers to sentiment data to determine the nuances of the speech.

[0674] Step 5:

[0675] If the server detects harassment or other inappropriate behavior, it will generate a warning message on the device via a notification system. This message will be tailored and customized appropriately based on the user's emotional state.

[0676] Step 6:

[0677] The server sends the generated warning message to the terminal. This communication is also conducted with an emphasis on security and low latency.

[0678] Step 7:

[0679] The terminal receives warning messages from the server and notifies the user through the user interface. Notifications are delivered via pop-ups and alert sounds, and are presented to the user in an emotionally sensitive manner.

[0680] Step 8:

[0681] Users receive notifications on their devices, giving them an opportunity to review their own statements and actions. If necessary, they can follow the suggested improvements and modify their communication to make it better.

[0682] (Example 2)

[0683] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0684] Traditional communication analysis systems detect inappropriate behavior based solely on the content of user statements, failing to capture subtle inappropriate actions that include emotional nuances. Furthermore, warnings lacked flexibility because they did not consider the user's emotional state.

[0685] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0686] In this invention, the server includes means for monitoring and acquiring information, means for transforming the acquired information, and means for analyzing the transformed information and determining its state. This makes it possible to comprehensively analyze the user's statements and emotional state, generate appropriate warnings, and send notifications.

[0687] A "device for monitoring and acquiring information" is a device that continuously detects user speech and text input and collects that data.

[0688] A "device for converting acquired information" is a device for appropriately converting audio data into text data or other formats.

[0689] A "device that analyzes converted information and determines the state" is a device that analyzes the user's emotions and intentions based on the converted data and identifies the corresponding emotional state.

[0690] A "device that detects behavior based on received data" is a device that uses analyzed data to identify inappropriate behavior or potential problematic behaviors of users.

[0691] A "device that notifies users" is a device that provides warnings and advice to users based on detected problems.

[0692] An "information display device" is a device that presents information to a user visually or audibly.

[0693] This invention provides a system that enables sophisticated communication analysis by monitoring user statements and inputs and evaluating their content and emotional state. The following describes how this invention can be implemented.

[0694] The device constantly monitors the user's speech and text input. When the user inputs by voice, the device first acquires the voice data and converts it into text data using "speech recognition technology." This conversion uses commonly available speech recognition software.

[0695] All input data is then analyzed by an emotion engine. The emotion engine uses specific algorithms to determine the user's emotions from the input data. It analyzes tone from voice data and context and style from text data to infer emotions such as anger or sadness. Technologies used for analysis include "emotion analysis software."

[0696] The data analyzed by the device is sent to the server via communication means. The server receives this data and uses a "generative AI model" and other tools to comprehensively evaluate the user's statements and emotional state, and detects the possibility of inappropriate behavior. If inappropriate behavior is detected, an appropriate warning is sent to the device via a notification means.

[0697] As an example, consider an online meeting scenario. If a member says, "Your plan isn't realistic," and this statement is transcribed by the terminal, an emotion engine detects anger, this information is sent to the server. If the server detects it as an inappropriate statement, it generates a message saying, "We'll consider other suggestions," and delivers it to the user in a calm tone. In this way, the atmosphere of the meeting can be calmed.

[0698] This system can use the prompt "When the user is expressing strong emotions during a conversation, please generate and provide appropriate feedback based on those emotions and context" as a prompt for the generative AI model. This allows for more appropriate communication while taking into account the user's emotional responses.

[0699] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0700] Step 1:

[0701] The terminal acquires user speech and text input. If the user inputs by voice, this is collected as voice data. The input is voice waveform data, and the output is voice data as is. The voice data is temporarily held in memory in preparation for subsequent processing.

[0702] Step 2:

[0703] The device converts acquired audio data into text data using speech recognition technology. The input is audio data, and the output is text data in string format. This conversion uses an algorithm that analyzes audio patterns and converts them into corresponding linguistic expressions.

[0704] Step 3:

[0705] The device sends the converted text data to the emotion engine to determine the user's emotional state. The input is text data, and the output is metadata indicating the emotional state. Here, the context and word choices of the text are analyzed, and the intensity and type of emotion are quantified using an emotion analysis algorithm.

[0706] Step 4:

[0707] The terminal transmits the sentiment analysis results and text data to the server using a communication method. The input is text data and its sentiment state metadata, and the output is a data package transferred to the server. The data is encrypted using a communication protocol and securely transmitted to the server.

[0708] Step 5:

[0709] The server uses a generative AI model with the received data to detect misconduct. The input consists of text data and emotional state metadata, and the output is the result of the misconduct determination. The generative AI model uses natural language processing to comprehensively evaluate the input content and emotions.

[0710] Step 6:

[0711] The server generates a notification for the user based on the detection results. The input is the result of the misconduct determination, and the output is the message presented to the user. A generative AI model generates the suggested content and constructs the message in an appropriate tone.

[0712] Step 7:

[0713] The terminal displays notifications received from the server to the user. The input is a message from the server, and the output is a notification presented to the user visually or audibly. Notifications are delivered using a display or speaker in a way that the user can intuitively understand.

[0714] (Application Example 2)

[0715] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0716] In modern households, there is a need to mitigate discord caused by emotional misunderstandings and inappropriate behavior in interpersonal communication, and to promote smooth dialogue. However, there is insufficient technology to sense subtle emotional changes in communication in a timely manner and provide appropriate support. This invention aims to provide a system that can prevent potential problems, particularly in communication within the family.

[0717] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0718] In this invention, the server includes information acquisition means for monitoring user speech or input, communication means for transmitting the acquired information to a remote processing unit, and detection means for detecting inappropriate behavior from the received information in the processing unit. This makes it possible to analyze the content of conversations and improve emotional states as an automated device in the home.

[0719] "Information acquisition means" refers to a device or system that has the function of monitoring user speech and input and collecting that data.

[0720] "Communication means" refers to a device or system that has the function of transmitting collected information to a processing device located at a remote location.

[0721] A "processing device" is a device or system that has the function of performing analysis to detect inappropriate behavior based on received information.

[0722] A "detection method" is a system that uses natural language processing technology to analyze information and has the function of detecting emotional states and inappropriate behavior.

[0723] A "notification means" is a device or system that has the function of notifying the user of warnings or suggestions based on the detected results.

[0724] A "home-use automated device" is a device that analyzes everyday communication within the home and provides support to facilitate smooth dialogue.

[0725] The system designed to realize this application is an automated device for facilitating communication within the home. In this system, information acquisition means, communication means, processing means, detection means, and notification means work together. Specifically, it operates through the following process:

[0726] The device (a home robot) constantly monitors the user's speech and text input through information acquisition methods. In the case of voice data, a microphone captures the voice, and a speech analysis API (such as Google Speech-to-Text) is used to convert the voice to text. The text data is then sent to a server via the internet using communication methods.

[0727] The server analyzes the received text data using a natural language processing library (e.g., NLTK) to assess the emotional state. This assessment includes analysis based on changes in voice tone and the context of the text. If inappropriate behavior is detected, the server notifies the terminal of this information.

[0728] Based on this information, the device sends appropriate warnings and suggestions to the user. For example, if the user shows signs of stress, the robot will offer a gentle suggestion such as, "Why don't you take a short break?" This helps to facilitate smoother communication within the household.

[0729] For example, if children start arguing during a game, the robot can instantly analyze the situation and suggest, "Why don't you take a break?" to alleviate the conflict. An example of a prompt for the generative AI model in such a scenario would be, "How can a household robot come up with a suggestion to calm the conversation during the game and communicate it?"

[0730] This system makes it possible to mitigate communication friction that often occurs within the family in real time, and to maintain good relationships.

[0731] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0732] Step 1:

[0733] The device captures the user's speech through the microphone. The input here is audio data. The device converts this audio data into text data using a speech analysis API. The converted text data is the output.

[0734] Step 2:

[0735] The terminal sends the converted text data to the server using a communication method. The output of this processing step is the text data sent to the server. The server takes the received text data as input for processing.

[0736] Step 3:

[0737] The server analyzes the received text data using a natural language processing library to evaluate the emotional state. For example, it identifies keywords and tones that indicate emotion within the text and determines the type of emotion. The input is the text data received by the server, and the output is the analyzed emotion data.

[0738] Step 4:

[0739] Based on the analysis results, the server generates appropriate warnings or suggestions if problems are detected. A generative AI model is applied, and its output is a message encouraging improvement. This is generated using prompts to the generative AI model. The input for this step is sentiment data, and the output is the generated suggestion message.

[0740] Step 5:

[0741] The server sends the generated suggestion message to the terminal. This sent message becomes input, and the terminal performs an action to notify the user via display or audio as output. The terminal provides the suggestion through its user interface, for example, using a display or speaker.

[0742] This processing flow makes it possible to support communication within the family and help maintain good relationships.

[0743] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0744] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0745] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0746] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0747] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0748] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0749] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0750] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0751] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0752] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0753] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0754] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0755] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0756] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0757] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0758] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0759] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0760] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0761] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0762] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0763] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0764] The following is further disclosed regarding the embodiments described above.

[0765] (Claim 1)

[0766] A data acquisition means for monitoring user speech or input,

[0767] A communication means for transmitting acquired data to a remote processing unit,

[0768] The aforementioned processing device includes a detection means for detecting an improper act from the received data,

[0769] A notification means that alerts the user based on the detection results,

[0770] A system that includes this.

[0771] (Claim 2)

[0772] The system according to claim 1, characterized in that the detection means analyzes the data using natural language processing technology.

[0773] (Claim 3)

[0774] The system according to claim 1, characterized in that the notification means displays a warning through a user interface.

[0775] "Example 1"

[0776] (Claim 1)

[0777] A data collection means for monitoring user voice or text information,

[0778] A data transmission means for transmitting collected information to a remote information processing device,

[0779] The aforementioned information processing device includes a detection means for analyzing received information and identifying fraudulent activity,

[0780] An information presentation means that sends a warning to the user based on the detection results,

[0781] A system that includes this.

[0782] (Claim 2)

[0783] The system according to claim 1, characterized in that the detection means processes information using language analysis technology.

[0784] (Claim 3)

[0785] The system according to claim 1, characterized in that the information presentation means visually provides a warning via a user operating surface.

[0786] "Application Example 1"

[0787] (Claim 1)

[0788] A data acquisition method for obtaining user language data,

[0789] A communication means for transmitting acquired language data to an external data processing device,

[0790] A detection means that analyzes received language data and detects negative content,

[0791] A notification mechanism that provides users with warnings, including suggestions for improvement, based on the detected results.

[0792] A system that includes this.

[0793] (Claim 2)

[0794] The system according to claim 1, characterized in that the detection means processes language data using language analysis technology.

[0795] (Claim 3)

[0796] The system according to claim 1, characterized in that the notification means displays a warning indicating a suggestion for improvement via a user interface.

[0797] "Example 2 of combining an emotion engine"

[0798] (Claim 1)

[0799] A device for monitoring and acquiring information,

[0800] A device for converting acquired information,

[0801] A device that analyzes the converted information and determines the state,

[0802] A device that transmits the analyzed data to a remote information processing device,

[0803] A device that detects actions based on received data,

[0804] A device that notifies the user based on the detection results,

[0805] A system that includes this.

[0806] (Claim 2)

[0807] The system according to claim 1, characterized in that the device for detecting the aforementioned action analyzes the information using natural language processing technology.

[0808] (Claim 3)

[0809] The system according to claim 1, characterized in that the device that provides the notification displays a warning via an information display device.

[0810] "Application example 2 when combining with an emotional engine"

[0811] (Claim 1)

[0812] Information acquisition means for monitoring user utterances or inputs,

[0813] A communication means for transmitting acquired information to a remote processing unit,

[0814] The processing apparatus includes a detection means for detecting inappropriate behavior from received information,

[0815] A notification means that alerts the user based on the detection results,

[0816] A means of providing a method for analyzing the content of conversations and improving emotional states as an automated device for home use,

[0817] A system that includes this.

[0818] (Claim 2)

[0819] The system according to claim 1, characterized in that the detection means analyzes information using natural language processing technology and evaluates the tone of voice and the context of text.

[0820] (Claim 3)

[0821] The system according to claim 1, characterized in that the notification means resolves the discord through an interface that proposes a workaround by showing the user a suggestion via sight or hearing. [Explanation of Symbols]

[0822] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A data acquisition method for obtaining user language data, A communication means for transmitting acquired language data to an external data processing device, A detection means that analyzes received language data and detects negative content, A notification mechanism that provides users with warnings, including suggestions for improvement, based on the detected results. A system that includes this.

2. The system according to claim 1, characterized in that the detection means processes language data using language analysis technology.

3. The system according to claim 1, characterized in that the notification means displays a warning indicating a suggestion for improvement via a user interface.