system

The system addresses unintentional sexual harassment in the workplace by using voice and text analysis to detect problematic phrases and provide context-aware warnings, enhancing workplace communication and preventing harassment through continuous learning.

JP2026105508APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-16
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Sexual harassment in the workplace often occurs unintentionally, leading to a damaged work environment and psychological burden on victims, with existing methods failing to provide effective real-time prevention and improvement.

Method used

A system that processes voice or text data to detect problematic phrases, generates warnings, and learns from user feedback to improve accuracy, utilizing speech recognition, natural language processing, and emotional analysis to provide context-aware alerts.

Benefits of technology

Enables real-time detection and prevention of sexual harassment by providing immediate warnings, improving workplace communication, and enhancing the working environment through continuous learning and emotional state consideration.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026105508000001_ABST
    Figure 2026105508000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] Data collection means, An analysis means for analyzing and evaluating collected audio signals or text information, A comparative tool for comparing case studies and information related to sexual harassment, An alert generation means that identifies phrases with potential problems and generates alerts, A means for transmitting the generated alert to the user device, A method using a portable information device capable of real-time processing, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0004] ,

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] Sexual harassment is a serious problem in the workplace, and many people suffer from it, but it is often done unintentionally by the person himself / herself. As a result, the atmosphere of the workplace is damaged, and the victim bears a psychological burden. The purpose of the present invention is to provide an effective means for preventing sexual harassment and improving the working environment of enterprises.

Means for Solving the Problems

[0005] This invention provides a system that processes voice or text data collected by a data input means, compares it with a database of sexual harassment-related cases to detect problematic phrases, and generates warnings. This system notifies the user terminal of the warning, allowing the user to review their own behavior, and also includes a learning process to improve the system's accuracy through feedback.

[0006] "Data input means" refers to a device or function for capturing voice or text data from a user and providing it to an internal system for processing.

[0007] "Processing means" refers to an algorithm and its execution environment for analyzing input audio or text data and recognizing specific patterns or phrases.

[0008] "Matching means" refers to a system that compares processed data with a database of sexual harassment-related cases to identify similar expressions.

[0009] A "warning generation mechanism" is a function that creates warning information for the user when a problematic phrase is detected, and is a process for displaying that warning to the user.

[0010] "Notification means" refers to communication means and display functions that send generated warnings to the user's terminal so that the user can confirm them.

[0011] "Speech recognition means" refers to technology and software for converting input speech data into text data.

[0012] "Learning tools" refer to the function of analyzing user feedback and adjusting algorithms to improve the system's detection accuracy. [Brief explanation of the drawing]

[0013] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0014] An example of an embodiment of the system according to the technology of the present disclosure will be described below with reference to the accompanying drawings.

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0019] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] This invention is a system aimed at preventing sexual harassment, and it primarily functions via user terminals, servers, and the communication network connecting them. This system is realized by combining various technologies such as speech recognition, natural language processing, database matching, warning generation, and user notifications.

[0035] Terminal processing

[0036] The device has the ability to capture the user's voice or text input in real time. On the device, speech recognition technology is used to convert the voice data into text, preparing it for processing as text data. This text data is sent to the server via a secure protocol.

[0037] Server Processing

[0038] The server analyzes the received text data using a natural language processing engine. The analyzed data is then compared against a database containing cases related to sexual harassment, and problematic phrases are identified. Based on the comparison results, the server generates a warning as needed. This warning includes the identified problematic expression and an explanation based on its context.

[0039] User notifications

[0040] The generated alerts are sent to the device and notified to the user. The device presents the alert to the user in the form of a pop-up message or other notification, giving the user an opportunity to reflect on their statements and messages. The user can then review the alert and correct their actions as needed.

[0041] Learning and Improvement

[0042] User feedback is sent to the server, and the system's algorithms are updated based on this feedback. The learning process on the server will improve the accuracy of future sexual harassment detection and enable more effective warnings.

[0043] Specific example

[0044] Suppose a user is having a video conference at work and makes a comment like, "You look cute today." The device captures this audio and sends it as text data to the server. The server compares this phrase to a database and determines that it may be similar to past cases of sexual harassment. The server then generates a warning and notifies the device. The user will see a warning on the screen that says, "This expression may be considered sexual harassment. Please be careful." In this way, the system identifies potential problems in real time and contributes to improving the workplace environment.

[0045] The following describes the processing flow.

[0046] Step 1:

[0047] The device captures audio data through the user's microphone input and also has the capability to collect text data from keyboard input. This prepares the device for real-time monitoring of user communication.

[0048] Step 2:

[0049] The device converts the captured audio data into text data using speech recognition technology. This converted text data is temporarily stored in a buffer for subsequent processing.

[0050] Step 3:

[0051] The terminal sends the converted text data to the server via a secure communication protocol. This data is treated as information necessary for detecting sexual harassment.

[0052] Step 4:

[0053] The server analyzes the received text data using a natural language processing engine. This analysis includes grammatical and semantic analysis, extracting important phrases and words.

[0054] Step 5:

[0055] The server compares the analyzed data with a database of sexual harassment-related cases. Here, it calculates the similarity to known cases in the database to determine if there is a potential problem.

[0056] Step 6:

[0057] The server generates an alert if it determines there is a problem. The alert includes the detected phrase, its context, and relevant advice.

[0058] Step 7:

[0059] The server sends the generated warning to the terminal. The terminal receives this warning and displays it in a user-friendly format.

[0060] Step 8:

[0061] Users view warnings through the screen on their device. Based on the warnings, they have the opportunity to review their communication and change their approach if necessary.

[0062] Step 9:

[0063] Users can send feedback on warnings to the server via their device as needed. This feedback is used as part of the system's continuous improvement and learning process.

[0064] Step 10:

[0065] The server analyzes the feedback received and updates the system's algorithms. This improves the accuracy of future sexual harassment detection.

[0066] (Example 1)

[0067] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0068] In today's workplace, remarks and communications that could lead to sexual harassment are sometimes overlooked, and there is a need for effective means to prevent this. In particular, it is important to improve the workplace environment and raise employer awareness by pointing out problematic expressions in real time and issuing immediate warnings to users.

[0069] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0070] In this invention, the server includes means for inputting voice or text information, means for processing and analyzing the input data, means for comparing it with a record storage device related to sexual harassment, means for detecting identified problematic expressions and generating notifications, and means for receiving user feedback and improving the system. This makes it possible to detect potentially problematic remarks in real time and immediately notify users, thereby deterring inappropriate communication in the workplace.

[0071] "Means for inputting voice or text information" refers to a device that has the function of acquiring voice or text from a user and supplying it to the system for processing.

[0072] "Means for processing and analyzing input data" refers to technologies that analyze acquired audio or text data and perform processing to understand its content.

[0073] A "record-keeping device related to sexual harassment" refers to data storage that accumulates past cases and standard data for use in comparison.

[0074] "Means for detecting identified problematic expressions and generating notifications" refers to a device or program that has the function of identifying inappropriate expressions in analyzed data and creating and sending warnings to the user.

[0075] "Means of receiving feedback from users to improve the system" refers to techniques that collect user feedback and incorporate it into learning algorithms to improve the system's recognition accuracy and functionality.

[0076] This system provides advanced monitoring and warning functions aimed at preventing sexual harassment. It primarily consists of user terminals, communication networks, and servers.

[0077] terminal

[0078] The device captures the user's spoken or typed audio in real time. For example, it collects audio data using a microphone built into a laptop or smartphone. This audio data is then converted into text using software such as a "speech recognition API" installed on the device.

[0079] server

[0080] The server receives text data sent from the terminal and analyzes it using natural language processing techniques. This process utilizes a "natural language processing library" and compares it with existing examples of sexual harassment in the database. Based on the analysis results, the server generates a warning. For example, it analyzes various problematic expressions and creates an appropriate warning message.

[0081] User

[0082] The user receives the warning sent from the server on their device. The device typically displays the warning as a pop-up message on the screen, but it may be presented in other ways depending on the situation. The user reviews the warning and has an opportunity to reconsider their expression.

[0083] As a concrete example, if a user says "Your outfit looks great today" during a video conference, the audio is converted to text and sent to the server. The server analyzes this statement and, if it determines there is a risk of sexual harassment, generates a warning and notifies the user on their device that "This expression may be perceived as inappropriate by the recipient." This system allows users to immediately review their own statements and contribute to improving the workplace communication environment.

[0084] By utilizing generative AI models, warnings can be made more precise and refined, and notifications can be delivered in a way that is relevant to the user's actual statements and context. An example of a prompt might be, "Please tell me about expressions that may be considered sexual harassment in workplace conversations."

[0085] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0086] Step 1:

[0087] The device acquires either user voice or text input. In the case of voice input, it captures the audio in real time using the built-in microphone. This voice data becomes the input. The voice data is then converted to text using a "speech recognition API" or similar. This converted text data becomes the output.

[0088] Step 2:

[0089] The terminal sends the text data output in Step 1 to the server via a secure protocol. This transmitted text data becomes the new input. Using HTTPS or similar protocols can prevent data tampering. The server then prepares this text data for the next step.

[0090] Step 3:

[0091] The server receives text data and performs analysis using natural language processing (NLP) techniques. This process takes text data as input, understands the meaning within the text, and performs data processing to break down the elements. A "natural language processing library" is used. The analyzed results are output as structured data.

[0092] Step 4:

[0093] The server compares the data output in step 3 with a database containing sexual harassment cases. It uses the analyzed data as input to perform data calculations that compare it with past cases. A flag is output indicating that similar expressions were found as a result of the comparison.

[0094] Step 5:

[0095] The server generates a warning if there is a problem based on the matching results. Using the matching flag as input, it utilizes a "generating AI model" to create an appropriate warning message. The generated warning message becomes the output.

[0096] Step 6:

[0097] The server sends the generated warning message to the terminal. The warning message becomes input and is delivered to the terminal via network communication.

[0098] Step 7:

[0099] The device receives warning messages sent from the server and notifies the user. The warning message becomes input and is displayed as a pop-up message or push notification so that the user can easily check it. This gives the user an opportunity to reflect on and correct their statements.

[0100] (Application Example 1)

[0101] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0102] Preventing sexual harassment is a critical issue in the workplace and public spaces. Traditional methods often rely on post-incident responses, making real-time prevention difficult. Therefore, there is a need for a method that effectively suppresses potential sexual harassment by analyzing audio signals in real time and generating immediate warnings.

[0103] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0104] In this invention, the server includes data collection means, analysis means for analyzing and evaluating collected audio signals or text information, and transmission means for transmitting generated alerts to the user device. This allows for real-time analysis of words and expressions spoken by the user, and enables immediate warnings if there are potential problems.

[0105] A "data collection means" is a device that has the function of acquiring audio signals and text information from the user's voice or text input.

[0106] "Analysis means" refers to a device that has the function of analyzing collected audio signals or text information and evaluating its content.

[0107] A "comparison tool" is a tool that allows for the determination of whether or not a problem exists by comparing the analyzed information with a collection of cases related to sexual harassment.

[0108] An "alert generation mechanism" is a device that generates a warning when it detects a phrase that may contain a potential problem and immediately communicates it to the user.

[0109] A "communication means" is a device that has the function of transmitting generated warnings to the user's device and notifying the user in real time.

[0110] A "portable information device" refers to a compact electronic device that users carry with them on a daily basis and that has the function of collecting and processing voice and text data.

[0111] "Speech recognition technology" is a technology that converts speech signals into text information, enabling subsequent processing.

[0112] "Learning technology" refers to technology that improves and optimizes the system based on user responses, enabling the generation of more accurate warnings.

[0113] The system realizing this invention aims to prevent sexual harassment by utilizing portable information devices and servers to acquire and analyze voice signals and text information in real time. Its main components include data collection means, analysis means, comparison means, alert generation means, communication means, and learning technology.

[0114] The server receives audio signals or text information from the user's portable information device and first analyzes the collected data using speech recognition technology. The analyzed information is then compared with a collection of sexual harassment-related case studies. Using comparison tools, problematic phrases are identified, and an alert generation tool immediately generates a warning. The generated warning is sent to the user's device via a communication tool, notifying the user in real time. This gives the user an immediate opportunity to address potential problems with their statements.

[0115] As a concrete example, consider a scenario where a user is in an office meeting and unconsciously says, "You look great today." This system captures the statement, analyzes it immediately, and evaluates its potential for problematic behavior by comparing it to past examples. If a problem is identified, a warning message appears to the user stating, "This expression may be misinterpreted." In this way, the system identifies potential problems in real time and supports appropriate social behavior.

[0116] An example of a prompt for a generative AI model would be: "Use natural language processing to analyze the following statement and determine if it may constitute sexual harassment: 'You look lovely today.'"

[0117] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0118] Step 1:

[0119] The device uses its built-in microphone to capture the user's voice signal in real time. It acquires the voice signal as input and temporarily stores it as preparation for processing.

[0120] Step 2:

[0121] The device uses speech recognition technology to convert the captured audio signal into text. In this step, the audio signal is used as input and text information is generated as output. A speech recognition system (e.g., Google's API) is used to analyze the audio data and convert colloquial expressions into text.

[0122] Step 3:

[0123] The text data is sent to the server using a secure protocol. This is the process of preparing the server for analysis by passing the text data as input.

[0124] Step 4:

[0125] The server processes the received text data through a natural language processing engine to perform language analysis. It processes the text data as input and outputs the analysis results. The analysis includes examining grammar, context, and word meanings.

[0126] Step 5:

[0127] The server uses the analysis results to compare text data with a collection of sexual harassment case studies. The analysis results are fed into a matching mechanism as input, and a list of problematic phrases is obtained as output. This process uses database queries to identify expressions that match past cases.

[0128] Step 6:

[0129] Based on the matching results, the server immediately generates an alert if the problematic phrase is found. The generated alert uses the matching results as input and produces an alert message as output. The content of the alert is determined using an alert generation algorithm.

[0130] Step 7:

[0131] The server sends the generated warning to the terminal. This process uses the warning message as input to notify the user in real time. As a result, the warning is displayed on the terminal, prompting the user to pay attention.

[0132] Step 8:

[0133] Users review the warnings they receive and modify their statements and actions as needed. This allows users to consciously take socially appropriate actions based on the warnings.

[0134] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0135] This invention incorporates an emotion engine that analyzes the user's emotions into a system that monitors a user's voice or text data in real time, detects potentially sexual harassment phrases, and issues warnings. This system can analyze the user's emotional state through communication in real time and adjust the warning content according to that emotion.

[0136] Terminal processing

[0137] The terminal captures the user's voice and text data through data input means. The voice data is converted to text using speech recognition means, and the series of text data is analyzed by an emotion engine before being sent to the server for processing.

[0138] Analysis of the Emotion Engine

[0139] The emotion engine analyzes the user's voice tone and linguistic features in their text to estimate their current emotional state. For example, it categorizes the user into an emotion category such as tension, anger, or joy, and this information is sent to the server.

[0140] Server Processing

[0141] The server analyzes the received text data using a natural language processing engine and compares phrases related to sexual harassment with a case database using a matching mechanism. Taking into account emotional information obtained from an emotion engine, the warning generation mechanism generates the most appropriate warning based on the results.

[0142] Warning adjustment

[0143] The generated warnings are sent to the user's terminal via a notification system. The warnings are adjusted according to the user's emotional state; for example, if anger is detected, the warning will be in a calmer tone, and if tension is recognized, it will be in an encouraging tone.

[0144] User response and system learning

[0145] Users view warnings on their devices and provide feedback. This feedback is recorded on the server side and used to update the emotion engine and warning generation algorithms, thereby improving the system's accuracy.

[0146] Specific example

[0147] Suppose a user makes the comment, "You seem grumpy today," during a video conference. The device captures this audio and converts it into text. The emotion engine detects that the user is somewhat upset and sends the data to the server. The server determines, through a database comparison, that this comment constitutes potential sexual harassment and generates a warning. This warning is delivered to the user in an emotionally sensitive manner, such as, "Such comments can be misinterpreted. Let's try to be a little more considerate." Upon receiving this warning, the user can reflect on their words and actions carefully and strive to improve their communication.

[0148] The following describes the processing flow.

[0149] Step 1:

[0150] The device captures audio data from the user's microphone. Once voice input is complete, it uses speech recognition technology to convert this data into text format. This converted text data is stored in a buffer for later analysis.

[0151] Step 2:

[0152] The device estimates the user's emotions by analyzing the sound quality and characteristics of their voice. An emotion engine analyzes tone, pitch, speed, etc., and prepares the results along with text data.

[0153] Step 3:

[0154] The device sends the converted and analyzed data to the server. This data includes the user's text communications and their emotional state at the time.

[0155] Step 4:

[0156] The server analyzes the received text data using a natural language processing engine. Here, it determines whether the data contains phrases related to sexual harassment based on specific phrases or keywords.

[0157] Step 5:

[0158] The server adjusts the analysis results based on the user's emotional state information. Information from the emotion engine is included as a factor that influences the tone and specific content of the warning generation.

[0159] Step 6:

[0160] The server uses a matching mechanism to compare text data with a case database. If a pattern similar to sexual harassment is detected, a warning generation mechanism is activated.

[0161] Step 7:

[0162] The server generates customized alerts based on detected problems and emotional states. For example, if anger is detected, it will issue a warning in a calm tone.

[0163] Step 8:

[0164] Warnings sent from the server are notified to the user on the terminal. The terminal presents this notification to the user through a pop-up message or other interface.

[0165] Step 9:

[0166] Users review the warnings presented and provide feedback. By reviewing their own words and actions as needed, communication improves.

[0167] Step 10:

[0168] User feedback is sent from the device to the server and analyzed using a learning algorithm. This feedback is used to improve and adjust the system.

[0169] (Example 2)

[0170] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0171] Conventional systems have difficulty analyzing voice or text data in a timely manner, making it particularly challenging to detect and respond appropriately to sexual harassment-related remarks in real time. Furthermore, they lack the functionality to adjust warnings based on the user's emotional state, which hinders their ability to effectively facilitate communication.

[0172] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0173] In this invention, the server includes data acquisition means, analysis means for processing acquired voice or text information to analyze emotions, and comparison means for comparing with an information database related to sexual harassment. This enables the generation of warnings that accurately reflect the user's emotions in real time, and allows for a rapid and appropriate response to sexual harassment.

[0174] "Data acquisition means" refers to a device or software function for collecting voice or text information from a user.

[0175] "Analysis means" refers to a device or software function for processing acquired audio or text information and analyzing the user's emotional state.

[0176] "Comparison means" refers to a device or software function that compares analyzed information with a database containing information related to sexual harassment.

[0177] "Warning generation means" refers to a device or software function that generates a warning to notify the user based on the results of the comparison means.

[0178] "Transmission means" refers to a device or function that transmits the generated warning to the user's information terminal.

[0179] "Improvement measures" refer to devices or software functions that receive feedback and opinions from users to improve the system's performance and responsiveness.

[0180] This invention is a system that analyzes voice and text data in real time on an information terminal used by a user and detects phrases that may constitute sexual harassment by comparing them with a specific database. This system can improve communication by generating warnings that take into account the user's emotional state and adjusting the content of the warnings according to the user's emotions when notifying them.

[0181] The terminal captures user voice and text information using data acquisition means equipped with a microphone and text input device. The voice is converted into text data using speech recognition software (e.g., a speech recognition engine). This is the process of transcribing the user's utterances into text. The converted text data is then analyzed to evaluate the user's emotional state. Specifically, it is categorized into emotional categories such as tension or anger based on voice tone and language choices.

[0182] After receiving this analysis data, the server uses a natural language processing engine (e.g., a natural language processing AI model) to analyze the text data and compare it against a database to determine if it contains phrases related to sexual harassment. Here, the case database is a collection of similar phrases from the past and is used to quickly evaluate the user's statements.

[0183] The warning generation mechanism generates an appropriate warning based on the matching results and the user's emotional data. The generated warning is then sent to the user's device via the transmission mechanism. The warning content is adjusted according to the user's emotions; for example, if the user is showing signs of tension, the warning will be changed to something that will alleviate that tension.

[0184] Users can view this warning on their device and consciously adjust their actions. They can also provide feedback on the warning, which is recorded on the server and used to improve the system's performance through various enhancement methods.

[0185] For example, if a user says "You seem grumpy today" during a video chat, the device converts this audio into text, and the emotion engine determines the emotion to be "anger." This data is then analyzed on the server, and if the phrase is determined to have the potential to be sexual harassment, a warning is generated and sent, such as "Such remarks can be misleading. Try to be a little more considerate."

[0186] An example of a prompt for a generating AI model is: "Generate a scenario that monitors phrases used by users in workplace conversations, detects potentially sexually harassing expressions, and issues appropriate warnings. Consider that the warnings should be adjusted according to the user's real-time emotional state."

[0187] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0188] Step 1:

[0189] The device captures user voice and text information in real time through data acquisition means. Voice data is acquired via a microphone, and text information is entered via keyboard or touch input. The acquired voice data is converted to text by a speech recognition engine. Input is raw voice data, and output is text data.

[0190] Step 2:

[0191] The device sends the converted text data and audio analysis to an emotion analysis engine. Here, the tone of the audio and linguistic features in the text are analyzed to determine the user's emotional state. Specifically, the audio data is analyzed for tone, speed, and emphasis, while the frequency of positive / negative expressions in the text data is calculated. The input consists of audio tone and text data, and the output is metadata indicating the emotional state.

[0192] Step 3:

[0193] The device sends the analysis results (emotional state metadata) along with text data to the server.

[0194] Step 4:

[0195] The server feeds the received text data into a natural language processing engine, which compares it with a database to determine if it contains phrases related to sexual harassment. During this process, the natural language processing engine analyzes the context and structure of the text. The input is text data, and the output indicates whether or not suspicious phrases are present.

[0196] Step 5:

[0197] The server uses a warning generation mechanism to generate a warning message based on emotional state metadata and detected phrase information. The generated warning is adjusted according to the emotional state. For example, if the user is stressed, the message will be encouraging, and if they are angry, it will be calm. The input is the emotional state and phrase information, and the output is the adjusted warning message.

[0198] Step 6:

[0199] The server sends the generated warning message to the terminal, and the terminal notifies the user. The terminal uses notification methods to convey the warning to the user as a pop-up display or audio notification. The input is the warning message, and the output is the notification to the user.

[0200] Step 7:

[0201] Users review warnings and provide feedback on their devices. This feedback is sent digitally to a server, recorded and analyzed by system improvement tools, and used for future speech detection and warning generation. The input is user feedback, and the output is information for system learning and improvement.

[0202] (Application Example 2)

[0203] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0204] In workplace communication, detecting sexual harassment and inappropriate remarks in real time is difficult, and there is the challenge of needing to respond appropriately while considering the emotional state of the speaker. Furthermore, it is important to improve the accuracy of the system by taking user feedback into account.

[0205] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0206] In this invention, the server includes a data acquisition function, a processing function for processing and analyzing acquired voice or text information, and an emotion analysis function for analyzing emotions and adjusting warnings according to the emotional state. This makes it possible to detect inappropriate remarks in workplace communication in real time and respond appropriately according to the emotional state of the speaker.

[0207] The "data acquisition function" is a function that captures audio or text information and converts it into a format necessary for analysis within the system.

[0208] The "processing function" is a function that analyzes acquired audio or text information and uses that analysis to determine inappropriate expressions, etc.

[0209] The "comparison function" is a feature that compares the analyzed information with a database of examples of inappropriate expressions to check for any matches.

[0210] The "warning generation function" is a feature that creates appropriate warnings for users based on detected problematic expressions.

[0211] The "notification function" is a function that sends generated warnings to the user's device.

[0212] The "emotion analysis function" is a feature that analyzes the user's emotional state and selects and adjusts warnings according to the situation.

[0213] The "voice analysis function" is a function that acquires voice information and converts it into text.

[0214] The "learning function" is a feature that receives feedback from users and updates and improves the system's algorithms and database based on that feedback.

[0215] This system primarily consists of user terminals such as smartphones and smart glasses, and servers connected via communication.

[0216] The user's device is equipped with a data acquisition function that captures the user's voice in real time during conversations. The voice is converted into text through a voice analysis function. This converted text information is immediately analyzed by the device's processing functions.

[0217] The server includes a comparison function that matches processed text information against a database of cases related to inappropriate expressions. This database contains known inappropriate expressions and phrases related to sexual harassment.

[0218] Furthermore, the server-side has an emotion analysis function to analyze the user's emotional state. This function estimates the emotional state based on the user's voice tone and linguistic features contained in the text. By classifying the emotional state into categories such as tension, anger, and joy, the emotions behind the utterances are understood.

[0219] Based on this, the warning generation function is activated to create the most appropriate warning. The generated warning is sent to the user's device via the notification function. The content of the warning is adjusted according to the user's emotional state, and if inappropriate language is detected, the user is prompted to reconsider their statement.

[0220] Furthermore, it incorporates a learning function, allowing for improvements to the system's database and algorithms by incorporating user feedback. This process continuously improves the overall accuracy of the system.

[0221] For example, if someone says "You seem a little grumpy today" during a meeting, the system will detect this and, especially if the emotion analysis function identifies a tense situation, it will generate a warning prompting the use of more appropriate language.

[0222] An example of a prompt is: "Explain how to analyze user sentiment from conversations, detect inappropriate phrases, and generate warnings."

[0223] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0224] Step 1:

[0225] The user's device captures the audio. The audio is acquired via a data input function and converted into text data by a speech analysis function. This converted text becomes the input for the next process.

[0226] Step 2:

[0227] The terminal's processing function analyzes the converted text data. During this process, it extracts linguistic features from the text data and generates information to estimate the user's emotional state. The output consists of the analyzed text data and metadata for sentiment analysis.

[0228] Step 3:

[0229] The server activates a comparison function that compares the analyzed text data with the case database. It matches the text data against known inappropriate expressions in the database and detects any matches. This result becomes the input for the next process.

[0230] Step 4:

[0231] The server's sentiment analysis function estimates emotions from the user's voice tone and linguistic features in the text. The input is metadata and voice features from step 2, and the output is data indicating the estimated emotional state.

[0232] Step 5:

[0233] The server's warning generation function generates the most appropriate warning for the user based on the detection results of inappropriate language and the estimated emotional state. The input is the detection results of inappropriate language and the emotional analysis results, and the output is a tailored warning message.

[0234] Step 6:

[0235] The generated warning message is sent to the user's device via the notification function. The user receives the warning message and has an opportunity to reconsider their actions. The output is the warning message displayed to the user.

[0236] Step 7:

[0237] Users provide feedback, which is recorded by the learning function. The server uses this feedback to update its database and algorithms, improving the system's accuracy. The input is user feedback, and the output is the updated system settings and database.

[0238] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0239] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0240] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0241] [Second Embodiment]

[0242] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0243] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0244] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0245] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0246] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0247] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0248] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0249] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0250] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0251] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0252] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0253] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0254] This invention is a system aimed at preventing sexual harassment, and it primarily functions via user terminals, servers, and the communication network connecting them. This system is realized by combining various technologies such as speech recognition, natural language processing, database matching, warning generation, and user notifications.

[0255] Terminal processing

[0256] The device has the ability to capture the user's voice or text input in real time. On the device, speech recognition technology is used to convert the voice data into text, preparing it for processing as text data. This text data is sent to the server via a secure protocol.

[0257] Server Processing

[0258] The server analyzes the received text data using a natural language processing engine. The analyzed data is then compared against a database containing cases related to sexual harassment, and problematic phrases are identified. Based on the comparison results, the server generates a warning as needed. This warning includes the identified problematic expression and an explanation based on its context.

[0259] User notifications

[0260] The generated alerts are sent to the device and notified to the user. The device presents the alert to the user in the form of a pop-up message or other notification, giving the user an opportunity to reflect on their statements and messages. The user can then review the alert and correct their actions as needed.

[0261] Learning and Improvement

[0262] User feedback is sent to the server, and the system's algorithms are updated based on this feedback. The learning process on the server will improve the accuracy of future sexual harassment detection and enable more effective warnings.

[0263] Specific example

[0264] Suppose a user is having a video conference at work and makes a comment like, "You look cute today." The device captures this audio and sends it as text data to the server. The server compares this phrase to a database and determines that it may be similar to past cases of sexual harassment. The server then generates a warning and notifies the device. The user will see a warning on the screen that says, "This expression may be considered sexual harassment. Please be careful." In this way, the system identifies potential problems in real time and contributes to improving the workplace environment.

[0265] The following describes the processing flow.

[0266] Step 1:

[0267] The device captures audio data through the user's microphone input and also has the capability to collect text data from keyboard input. This prepares the device for real-time monitoring of user communication.

[0268] Step 2:

[0269] The device converts the captured audio data into text data using speech recognition technology. This converted text data is temporarily stored in a buffer for subsequent processing.

[0270] Step 3:

[0271] The terminal sends the converted text data to the server via a secure communication protocol. This data is treated as information necessary for detecting sexual harassment.

[0272] Step 4:

[0273] The server analyzes the received text data using a natural language processing engine. This analysis includes grammatical and semantic analysis, extracting important phrases and words.

[0274] Step 5:

[0275] The server matches the analyzed data with the case database related to sexual harassment. Here, it calculates the similarity with known cases in the database to determine whether there are potential problems.

[0276] Step 6:

[0277] If the server determines that there is a problem, it generates a warning. The warning includes the detected phrase, its context, and related advice.

[0278] Step 7:

[0279] The server sends the generated warning to the terminal. The terminal receives this warning and displays it in a format that is easy for the user to understand.

[0280] Step 8:

[0281] The user checks the warning through the screen on the terminal. Based on the warning content, the user reviews their communication content and gets an opportunity to change their awareness if necessary.

[0282] Step 9:

[0283] If necessary, the user sends feedback on the warning to the server through the terminal. This feedback is used as part of the continuous improvement and learning of the system.

[0284] Step 10:

[0285] The server analyzes the received feedback and updates the system's algorithm. This improves the accuracy of future sexual harassment detection.

[0286] (Example 1)

[0287] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0288] In today's workplace, remarks and communications that could lead to sexual harassment are sometimes overlooked, and there is a need for effective means to prevent this. In particular, it is important to improve the workplace environment and raise employer awareness by pointing out problematic expressions in real time and issuing immediate warnings to users.

[0289] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0290] In this invention, the server includes means for inputting voice or text information, means for processing and analyzing the input data, means for comparing it with a record storage device related to sexual harassment, means for detecting identified problematic expressions and generating notifications, and means for receiving user feedback and improving the system. This makes it possible to detect potentially problematic remarks in real time and immediately notify users, thereby deterring inappropriate communication in the workplace.

[0291] "Means for inputting voice or text information" refers to a device that has the function of acquiring voice or text from a user and supplying it to the system for processing.

[0292] "Means for processing and analyzing input data" refers to technologies that analyze acquired audio or text data and perform processing to understand its content.

[0293] A "record-keeping device related to sexual harassment" refers to data storage that accumulates past cases and standard data for use in comparison.

[0294] "Means for detecting identified problematic expressions and generating notifications" refers to a device or program that has the function of identifying inappropriate expressions in analyzed data and creating and sending warnings to the user.

[0295] "Means of receiving feedback from users to improve the system" refers to techniques that collect user feedback and incorporate it into learning algorithms to improve the system's recognition accuracy and functionality.

[0296] This system provides advanced monitoring and warning functions aimed at preventing sexual harassment. It primarily consists of user terminals, communication networks, and servers.

[0297] terminal

[0298] The device captures the user's spoken or typed audio in real time. For example, it collects audio data using a microphone built into a laptop or smartphone. This audio data is then converted into text using software such as a "speech recognition API" installed on the device.

[0299] server

[0300] The server receives text data sent from the terminal and analyzes it using natural language processing techniques. This process utilizes a "natural language processing library" and compares it with existing examples of sexual harassment in the database. Based on the analysis results, the server generates a warning. For example, it analyzes various problematic expressions and creates an appropriate warning message.

[0301] User

[0302] The user receives the warning sent from the server on their device. The device typically displays the warning as a pop-up message on the screen, but it may be presented in other ways depending on the situation. The user reviews the warning and has an opportunity to reconsider their expression.

[0303] As a specific example, when a user makes a statement like "That outfit looks great today" during a video conference, the audio is converted into text and sent to the server. The server analyzes this statement and generates a warning when it determines that there is a risk of sexual harassment, and notifies the terminal with "Such an expression may be recognized as inappropriate by the recipient." With this system, the user can immediately review their own statements and contribute to improving the communication environment in the workplace.

[0304] By utilizing the generative AI model, the content of the warning can be refined more accurately, and the notification is made in a form that conforms to the user's actual statement and context. Examples of prompt texts include "Please inform me about expressions that may be regarded as sexual harassment in workplace conversations."

[0305] The flow of the specific process in Example 1 will be described using FIG. 11.

[0306] Step 1:

[0307] The terminal acquires the user's voice or text input. In the case of voice, the microphone installed in the terminal is used to capture the voice in real time. This voice data serves as the input. The voice data is converted into text using something like "voice recognition API". This converted text data serves as the output.

[0308] Step 2:

[0309] The terminal sends the text data output in Step 1 to the server via a secure protocol. This sent text data serves as the new input. By using something like HTTPS, data tampering can be prevented. On the server side, preparations are made to send this text data to the next step.

[0310] Step 3:

[0311] The server receives text data and performs analysis using natural language processing (NLP) techniques. This process takes text data as input, understands the meaning within the text, and performs data processing to break down the elements. A "natural language processing library" is used. The analyzed results are output as structured data.

[0312] Step 4:

[0313] The server compares the data output in step 3 with a database containing sexual harassment cases. It uses the analyzed data as input to perform data calculations that compare it with past cases. A flag is output indicating that similar expressions were found as a result of the comparison.

[0314] Step 5:

[0315] The server generates a warning if there is a problem based on the matching results. Using the matching flag as input, it utilizes a "generating AI model" to create an appropriate warning message. The generated warning message becomes the output.

[0316] Step 6:

[0317] The server sends the generated warning message to the terminal. The warning message becomes input and is delivered to the terminal via network communication.

[0318] Step 7:

[0319] The device receives warning messages sent from the server and notifies the user. The warning message becomes input and is displayed as a pop-up message or push notification so that the user can easily check it. This gives the user an opportunity to reflect on and correct their statements.

[0320] (Application Example 1)

[0321] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0322] Preventing sexual harassment is a critical issue in the workplace and public spaces. Traditional methods often rely on post-incident responses, making real-time prevention difficult. Therefore, there is a need for a method that effectively suppresses potential sexual harassment by analyzing audio signals in real time and generating immediate warnings.

[0323] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0324] In this invention, the server includes data collection means, analysis means for analyzing and evaluating collected audio signals or text information, and transmission means for transmitting generated alerts to the user device. This allows for real-time analysis of words and expressions spoken by the user, and enables immediate warnings if there are potential problems.

[0325] A "data collection means" is a device that has the function of acquiring audio signals and text information from the user's voice or text input.

[0326] "Analysis means" refers to a device that has the function of analyzing collected audio signals or text information and evaluating its content.

[0327] A "comparison tool" is a tool that allows for the determination of whether or not a problem exists by comparing the analyzed information with a collection of cases related to sexual harassment.

[0328] An "alert generation mechanism" is a device that generates a warning when it detects a phrase that may contain a potential problem and immediately communicates it to the user.

[0329] A "communication means" is a device that has the function of transmitting generated warnings to the user's device and notifying the user in real time.

[0330] A "portable information device" refers to a compact electronic device that users carry with them on a daily basis and that has the function of collecting and processing voice and text data.

[0331] "Speech recognition technology" is a technology that converts speech signals into text information, enabling subsequent processing.

[0332] "Learning technology" refers to technology that improves and optimizes the system based on user responses, enabling the generation of more accurate warnings.

[0333] The system realizing this invention aims to prevent sexual harassment by utilizing portable information devices and servers to acquire and analyze voice signals and text information in real time. Its main components include data collection means, analysis means, comparison means, alert generation means, communication means, and learning technology.

[0334] The server receives audio signals or text information from the user's portable information device and first analyzes the collected data using speech recognition technology. The analyzed information is then compared with a collection of sexual harassment-related case studies. Using comparison tools, problematic phrases are identified, and an alert generation tool immediately generates a warning. The generated warning is sent to the user's device via a communication tool, notifying the user in real time. This gives the user an immediate opportunity to address potential problems with their statements.

[0335] As a concrete example, consider a scenario where a user is in an office meeting and unconsciously says, "You look great today." This system captures the statement, analyzes it immediately, and evaluates its potential for problematic behavior by comparing it to past examples. If a problem is identified, a warning message appears to the user stating, "This expression may be misinterpreted." In this way, the system identifies potential problems in real time and supports appropriate social behavior.

[0336] An example of a prompt for a generative AI model would be: "Use natural language processing to analyze the following statement and determine if it may constitute sexual harassment: 'You look lovely today.'"

[0337] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0338] Step 1:

[0339] The device uses its built-in microphone to capture the user's voice signal in real time. It acquires the voice signal as input and temporarily stores it as preparation for processing.

[0340] Step 2:

[0341] The device uses speech recognition technology to convert the captured audio signal into text. In this step, the audio signal is used as input and text information is generated as output. A speech recognition system (e.g., Google's API) is used to analyze the audio data and convert colloquial expressions into text.

[0342] Step 3:

[0343] The text data is sent to the server using a secure protocol. This is the process of preparing the server for analysis by passing the text data as input.

[0344] Step 4:

[0345] The server processes the received text data through a natural language processing engine to perform language analysis. It processes the text data as input and outputs the analysis results. The analysis includes examining grammar, context, and word meanings.

[0346] Step 5:

[0347] The server uses the analysis results to compare text data with a collection of sexual harassment case studies. The analysis results are fed into a matching mechanism as input, and a list of problematic phrases is obtained as output. This process uses database queries to identify expressions that match past cases.

[0348] Step 6:

[0349] Based on the matching results, the server immediately generates an alert if the problematic phrase is found. The generated alert uses the matching results as input and produces an alert message as output. The content of the alert is determined using an alert generation algorithm.

[0350] Step 7:

[0351] The server sends the generated warning to the terminal. This process uses the warning message as input to notify the user in real time. As a result, the warning is displayed on the terminal, prompting the user to pay attention.

[0352] Step 8:

[0353] Users review the warnings they receive and modify their statements and actions as needed. This allows users to consciously take socially appropriate actions based on the warnings.

[0354] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0355] This invention incorporates an emotion engine that analyzes the user's emotions into a system that monitors a user's voice or text data in real time, detects potentially sexual harassment phrases, and issues warnings. This system can analyze the user's emotional state through communication in real time and adjust the warning content according to that emotion.

[0356] Terminal processing

[0357] The terminal captures the user's voice and text data through data input means. The voice data is converted to text using speech recognition means, and the series of text data is analyzed by an emotion engine before being sent to the server for processing.

[0358] Analysis of the Emotion Engine

[0359] The emotion engine analyzes the user's voice tone and linguistic features in their text to estimate their current emotional state. For example, it categorizes the user into an emotion category such as tension, anger, or joy, and this information is sent to the server.

[0360] Server Processing

[0361] The server analyzes the received text data using a natural language processing engine and compares phrases related to sexual harassment with a case database using a matching mechanism. Taking into account emotional information obtained from an emotion engine, the warning generation mechanism generates the most appropriate warning based on the results.

[0362] Warning adjustment

[0363] The generated warnings are sent to the user's terminal via a notification system. The warnings are adjusted according to the user's emotional state; for example, if anger is detected, the warning will be in a calmer tone, and if tension is recognized, it will be in an encouraging tone.

[0364] User response and system learning

[0365] Users view warnings on their devices and provide feedback. This feedback is recorded on the server side and used to update the emotion engine and warning generation algorithms, thereby improving the system's accuracy.

[0366] Specific example

[0367] Suppose a user makes the comment, "You seem grumpy today," during a video conference. The device captures this audio and converts it into text. The emotion engine detects that the user is somewhat upset and sends the data to the server. The server determines, through a database comparison, that this comment constitutes potential sexual harassment and generates a warning. This warning is delivered to the user in an emotionally sensitive manner, such as, "Such comments can be misinterpreted. Let's try to be a little more considerate." Upon receiving this warning, the user can reflect on their words and actions carefully and strive to improve their communication.

[0368] The following describes the processing flow.

[0369] Step 1:

[0370] The device captures audio data from the user's microphone. Once voice input is complete, it uses speech recognition technology to convert this data into text format. This converted text data is stored in a buffer for later analysis.

[0371] Step 2:

[0372] The device estimates the user's emotions by analyzing the sound quality and characteristics of their voice. An emotion engine analyzes tone, pitch, speed, etc., and prepares the results along with text data.

[0373] Step 3:

[0374] The device sends the converted and analyzed data to the server. This data includes the user's text communications and their emotional state at the time.

[0375] Step 4:

[0376] The server analyzes the received text data using a natural language processing engine. Here, it determines whether the data contains phrases related to sexual harassment based on specific phrases or keywords.

[0377] Step 5:

[0378] The server adjusts the analysis results based on the user's emotional state information. Information from the emotion engine is included as a factor that influences the tone and specific content of the warning generation.

[0379] Step 6:

[0380] The server uses a matching mechanism to compare text data with a case database. If a pattern similar to sexual harassment is detected, a warning generation mechanism is activated.

[0381] Step 7:

[0382] The server generates customized alerts based on detected problems and emotional states. For example, if anger is detected, it will issue a warning in a calm tone.

[0383] Step 8:

[0384] Warnings sent from the server are notified to the user on the terminal. The terminal presents this notification to the user through a pop-up message or other interface.

[0385] Step 9:

[0386] Users review the warnings presented and provide feedback. By reviewing their own words and actions as needed, communication improves.

[0387] Step 10:

[0388] User feedback is sent from the device to the server and analyzed using a learning algorithm. This feedback is used to improve and adjust the system.

[0389] (Example 2)

[0390] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0391] Conventional systems have difficulty analyzing voice or text data in a timely manner, making it particularly challenging to detect and respond appropriately to sexual harassment-related remarks in real time. Furthermore, they lack the functionality to adjust warnings based on the user's emotional state, which hinders their ability to effectively facilitate communication.

[0392] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0393] In this invention, the server includes data acquisition means, analysis means for processing acquired voice or text information to analyze emotions, and comparison means for comparing with an information database related to sexual harassment. This enables the generation of warnings that accurately reflect the user's emotions in real time, and allows for a rapid and appropriate response to sexual harassment.

[0394] "Data acquisition means" refers to a device or software function for collecting voice or text information from a user.

[0395] "Analysis means" refers to a device or software function for processing acquired audio or text information and analyzing the user's emotional state.

[0396] "Comparison means" refers to a device or software function that compares analyzed information with a database containing information related to sexual harassment.

[0397] "Warning generation means" refers to a device or software function that generates a warning to notify the user based on the results of the comparison means.

[0398] "Transmission means" refers to a device or function that transmits the generated warning to the user's information terminal.

[0399] "Improvement measures" refer to devices or software functions that receive feedback and opinions from users to improve the system's performance and responsiveness.

[0400] This invention is a system that analyzes voice and text data in real time on an information terminal used by a user and detects phrases that may constitute sexual harassment by comparing them with a specific database. This system can improve communication by generating warnings that take into account the user's emotional state and adjusting the content of the warnings according to the user's emotions when notifying them.

[0401] The terminal captures user voice and text information using data acquisition means equipped with a microphone and text input device. The voice is converted into text data using speech recognition software (e.g., a speech recognition engine). This is the process of transcribing the user's utterances into text. The converted text data is then analyzed to evaluate the user's emotional state. Specifically, it is categorized into emotional categories such as tension or anger based on voice tone and language choices.

[0402] After receiving this analysis data, the server uses a natural language processing engine (e.g., a natural language processing AI model) to analyze the text data and compare it against a database to determine if it contains phrases related to sexual harassment. Here, the case database is a collection of similar phrases from the past and is used to quickly evaluate the user's statements.

[0403] The warning generation mechanism generates an appropriate warning based on the matching results and the user's emotional data. The generated warning is then sent to the user's device via the transmission mechanism. The warning content is adjusted according to the user's emotions; for example, if the user is showing signs of tension, the warning will be changed to something that will alleviate that tension.

[0404] Users can view this warning on their device and consciously adjust their actions. They can also provide feedback on the warning, which is recorded on the server and used to improve the system's performance through various enhancement methods.

[0405] For example, if a user says "You seem grumpy today" during a video chat, the device converts this audio into text, and the emotion engine determines the emotion to be "anger." This data is then analyzed on the server, and if the phrase is determined to have the potential to be sexual harassment, a warning is generated and sent, such as "Such remarks can be misleading. Try to be a little more considerate."

[0406] An example of a prompt for a generating AI model is: "Generate a scenario that monitors phrases used by users in workplace conversations, detects potentially sexually harassing expressions, and issues appropriate warnings. Consider that the warnings should be adjusted according to the user's real-time emotional state."

[0407] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0408] Step 1:

[0409] The device captures user voice and text information in real time through data acquisition means. Voice data is acquired via a microphone, and text information is entered via keyboard or touch input. The acquired voice data is converted to text by a speech recognition engine. Input is raw voice data, and output is text data.

[0410] Step 2:

[0411] The device sends the converted text data and audio analysis to an emotion analysis engine. Here, the tone of the audio and linguistic features in the text are analyzed to determine the user's emotional state. Specifically, the audio data is analyzed for tone, speed, and emphasis, while the frequency of positive / negative expressions in the text data is calculated. The input consists of audio tone and text data, and the output is metadata indicating the emotional state.

[0412] Step 3:

[0413] The device sends the analysis results (emotional state metadata) along with text data to the server.

[0414] Step 4:

[0415] The server feeds the received text data into a natural language processing engine, which compares it with a database to determine if it contains phrases related to sexual harassment. During this process, the natural language processing engine analyzes the context and structure of the text. The input is text data, and the output indicates whether or not suspicious phrases are present.

[0416] Step 5:

[0417] The server uses a warning generation mechanism to generate a warning message based on emotional state metadata and detected phrase information. The generated warning is adjusted according to the emotional state. For example, if the user is stressed, the message will be encouraging, and if they are angry, it will be calm. The input is the emotional state and phrase information, and the output is the adjusted warning message.

[0418] Step 6:

[0419] The server sends the generated warning message to the terminal, and the terminal notifies the user. The terminal uses notification methods to convey the warning to the user as a pop-up display or audio notification. The input is the warning message, and the output is the notification to the user.

[0420] Step 7:

[0421] Users review warnings and provide feedback on their devices. This feedback is sent digitally to a server, recorded and analyzed by system improvement tools, and used for future speech detection and warning generation. The input is user feedback, and the output is information for system learning and improvement.

[0422] (Application Example 2)

[0423] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0424] In workplace communication, detecting sexual harassment and inappropriate remarks in real time is difficult, and there is the challenge of needing to respond appropriately while considering the emotional state of the speaker. Furthermore, it is important to improve the accuracy of the system by taking user feedback into account.

[0425] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0426] In this invention, the server includes a data acquisition function, a processing function for processing and analyzing acquired voice or text information, and an emotion analysis function for analyzing emotions and adjusting warnings according to the emotional state. This makes it possible to detect inappropriate remarks in workplace communication in real time and respond appropriately according to the emotional state of the speaker.

[0427] The "data acquisition function" is a function that captures audio or text information and converts it into a format necessary for analysis within the system.

[0428] The "processing function" is a function that analyzes acquired audio or text information and uses that analysis to determine inappropriate expressions, etc.

[0429] The "comparison function" is a feature that compares the analyzed information with a database of examples of inappropriate expressions to check for any matches.

[0430] The "warning generation function" is a feature that creates appropriate warnings for users based on detected problematic expressions.

[0431] The "notification function" is a function that sends generated warnings to the user's device.

[0432] The "emotion analysis function" is a feature that analyzes the user's emotional state and selects and adjusts warnings according to the situation.

[0433] The "voice analysis function" is a function that acquires voice information and converts it into text.

[0434] The "learning function" is a feature that receives feedback from users and updates and improves the system's algorithms and database based on that feedback.

[0435] This system primarily consists of user terminals such as smartphones and smart glasses, and servers connected via communication.

[0436] The user's device is equipped with a data acquisition function that captures the user's voice in real time during conversations. The voice is converted into text through a voice analysis function. This converted text information is immediately analyzed by the device's processing functions.

[0437] The server includes a comparison function that matches processed text information against a database of cases related to inappropriate expressions. This database contains known inappropriate expressions and phrases related to sexual harassment.

[0438] Furthermore, the server-side has an emotion analysis function to analyze the user's emotional state. This function estimates the emotional state based on the user's voice tone and linguistic features contained in the text. By classifying the emotional state into categories such as tension, anger, and joy, the emotions behind the utterances are understood.

[0439] Based on this, the warning generation function is activated to create the most appropriate warning. The generated warning is sent to the user's device via the notification function. The content of the warning is adjusted according to the user's emotional state, and if inappropriate language is detected, the user is prompted to reconsider their statement.

[0440] Furthermore, it incorporates a learning function, allowing for improvements to the system's database and algorithms by incorporating user feedback. This process continuously improves the overall accuracy of the system.

[0441] For example, if someone says "You seem a little grumpy today" during a meeting, the system will detect this and, especially if the emotion analysis function identifies a tense situation, it will generate a warning prompting the use of more appropriate language.

[0442] An example of a prompt is: "Explain how to analyze user sentiment from conversations, detect inappropriate phrases, and generate warnings."

[0443] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0444] Step 1:

[0445] The user's device captures the audio. The audio is acquired via a data input function and converted into text data by a speech analysis function. This converted text becomes the input for the next process.

[0446] Step 2:

[0447] The terminal's processing function analyzes the converted text data. During this process, it extracts linguistic features from the text data and generates information to estimate the user's emotional state. The output consists of the analyzed text data and metadata for sentiment analysis.

[0448] Step 3:

[0449] The server activates a comparison function that compares the analyzed text data with the case database. It matches the text data against known inappropriate expressions in the database and detects any matches. This result becomes the input for the next process.

[0450] Step 4:

[0451] The server's sentiment analysis function estimates emotions from the user's voice tone and linguistic features in the text. The input is metadata and voice features from step 2, and the output is data indicating the estimated emotional state.

[0452] Step 5:

[0453] The server's warning generation function generates the most appropriate warning for the user based on the detection results of inappropriate language and the estimated emotional state. The input is the detection results of inappropriate language and the emotional analysis results, and the output is a tailored warning message.

[0454] Step 6:

[0455] The generated warning message is sent to the user's device via the notification function. The user receives the warning message and has an opportunity to reconsider their actions. The output is the warning message displayed to the user.

[0456] Step 7:

[0457] Users provide feedback, which is recorded by the learning function. The server uses this feedback to update its database and algorithms, improving the system's accuracy. The input is user feedback, and the output is the updated system settings and database.

[0458] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0459] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0460] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0461] [Third Embodiment]

[0462] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0463] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0464] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0465] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0466] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0467] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0468] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0469] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0470] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0471] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0472] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0473] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0474] This invention is a system aimed at preventing sexual harassment, and it primarily functions via user terminals, servers, and the communication network connecting them. This system is realized by combining various technologies such as speech recognition, natural language processing, database matching, warning generation, and user notifications.

[0475] Terminal processing

[0476] The device has the ability to capture the user's voice or text input in real time. On the device, speech recognition technology is used to convert the voice data into text, preparing it for processing as text data. This text data is sent to the server via a secure protocol.

[0477] Server Processing

[0478] The server analyzes the received text data using a natural language processing engine. The analyzed data is then compared against a database containing cases related to sexual harassment, and problematic phrases are identified. Based on the comparison results, the server generates a warning as needed. This warning includes the identified problematic expression and an explanation based on its context.

[0479] User notifications

[0480] The generated alerts are sent to the device and notified to the user. The device presents the alert to the user in the form of a pop-up message or other notification, giving the user an opportunity to reflect on their statements and messages. The user can then review the alert and correct their actions as needed.

[0481] Learning and Improvement

[0482] User feedback is sent to the server, and the system's algorithms are updated based on this feedback. The learning process on the server will improve the accuracy of future sexual harassment detection and enable more effective warnings.

[0483] Specific example

[0484] Suppose a user is having a video conference at work and makes a comment like, "You look cute today." The device captures this audio and sends it as text data to the server. The server compares this phrase to a database and determines that it may be similar to past cases of sexual harassment. The server then generates a warning and notifies the device. The user will see a warning on the screen that says, "This expression may be considered sexual harassment. Please be careful." In this way, the system identifies potential problems in real time and contributes to improving the workplace environment.

[0485] The following describes the processing flow.

[0486] Step 1:

[0487] The device captures audio data through the user's microphone input and also has the capability to collect text data from keyboard input. This prepares the device for real-time monitoring of user communication.

[0488] Step 2:

[0489] The device converts the captured audio data into text data using speech recognition technology. This converted text data is temporarily stored in a buffer for subsequent processing.

[0490] Step 3:

[0491] The terminal sends the converted text data to the server via a secure communication protocol. This data is treated as information necessary for detecting sexual harassment.

[0492] Step 4:

[0493] The server analyzes the received text data using a natural language processing engine. This analysis includes grammatical and semantic analysis, extracting important phrases and words.

[0494] Step 5:

[0495] The server compares the analyzed data with a database of sexual harassment-related cases. Here, it calculates the similarity to known cases in the database to determine if there is a potential problem.

[0496] Step 6:

[0497] The server generates an alert if it determines there is a problem. The alert includes the detected phrase, its context, and relevant advice.

[0498] Step 7:

[0499] The server sends the generated warning to the terminal. The terminal receives this warning and displays it in a user-friendly format.

[0500] Step 8:

[0501] Users view warnings through the screen on their device. Based on the warnings, they have the opportunity to review their communication and change their approach if necessary.

[0502] Step 9:

[0503] Users can send feedback on warnings to the server via their device as needed. This feedback is used as part of the system's continuous improvement and learning process.

[0504] Step 10:

[0505] The server analyzes the feedback received and updates the system's algorithms. This improves the accuracy of future sexual harassment detection.

[0506] (Example 1)

[0507] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0508] In today's workplace, remarks and communications that could lead to sexual harassment are sometimes overlooked, and there is a need for effective means to prevent this. In particular, it is important to improve the workplace environment and raise employer awareness by pointing out problematic expressions in real time and issuing immediate warnings to users.

[0509] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0510] In this invention, the server includes means for inputting voice or text information, means for processing and analyzing the input data, means for comparing it with a record storage device related to sexual harassment, means for detecting identified problematic expressions and generating notifications, and means for receiving user feedback and improving the system. This makes it possible to detect potentially problematic remarks in real time and immediately notify users, thereby deterring inappropriate communication in the workplace.

[0511] "Means for inputting voice or text information" refers to a device that has the function of acquiring voice or text from a user and supplying it to the system for processing.

[0512] "Means for processing and analyzing input data" refers to technologies that analyze acquired audio or text data and perform processing to understand its content.

[0513] A "record-keeping device related to sexual harassment" refers to data storage that accumulates past cases and standard data for use in comparison.

[0514] "Means for detecting identified problematic expressions and generating notifications" refers to a device or program that has the function of identifying inappropriate expressions in analyzed data and creating and sending warnings to the user.

[0515] "Means of receiving feedback from users to improve the system" refers to techniques that collect user feedback and incorporate it into learning algorithms to improve the system's recognition accuracy and functionality.

[0516] This system provides advanced monitoring and warning functions aimed at preventing sexual harassment. It primarily consists of user terminals, communication networks, and servers.

[0517] terminal

[0518] The device captures the user's spoken or typed audio in real time. For example, it collects audio data using a microphone built into a laptop or smartphone. This audio data is then converted into text using software such as a "speech recognition API" installed on the device.

[0519] server

[0520] The server receives text data sent from the terminal and analyzes it using natural language processing techniques. This process utilizes a "natural language processing library" and compares it with existing examples of sexual harassment in the database. Based on the analysis results, the server generates a warning. For example, it analyzes various problematic expressions and creates an appropriate warning message.

[0521] User

[0522] The user receives the warning sent from the server on their device. The device typically displays the warning as a pop-up message on the screen, but it may be presented in other ways depending on the situation. The user reviews the warning and has an opportunity to reconsider their expression.

[0523] As a concrete example, if a user says "Your outfit looks great today" during a video conference, the audio is converted to text and sent to the server. The server analyzes this statement and, if it determines there is a risk of sexual harassment, generates a warning and notifies the user on their device that "This expression may be perceived as inappropriate by the recipient." This system allows users to immediately review their own statements and contribute to improving the workplace communication environment.

[0524] By utilizing generative AI models, warnings can be made more precise and refined, and notifications can be delivered in a way that is relevant to the user's actual statements and context. An example of a prompt might be, "Please tell me about expressions that may be considered sexual harassment in workplace conversations."

[0525] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0526] Step 1:

[0527] The device acquires either user voice or text input. In the case of voice input, it captures the audio in real time using the built-in microphone. This voice data becomes the input. The voice data is then converted to text using a "speech recognition API" or similar. This converted text data becomes the output.

[0528] Step 2:

[0529] The terminal sends the text data output in Step 1 to the server via a secure protocol. This transmitted text data becomes the new input. Using HTTPS or similar protocols can prevent data tampering. The server then prepares this text data for the next step.

[0530] Step 3:

[0531] The server receives text data and performs analysis using natural language processing (NLP) techniques. This process takes text data as input, understands the meaning within the text, and performs data processing to break down the elements. A "natural language processing library" is used. The analyzed results are output as structured data.

[0532] Step 4:

[0533] The server compares the data output in step 3 with a database containing sexual harassment cases. It uses the analyzed data as input to perform data calculations that compare it with past cases. A flag is output indicating that similar expressions were found as a result of the comparison.

[0534] Step 5:

[0535] The server generates a warning if there is a problem based on the matching results. Using the matching flag as input, it utilizes a "generating AI model" to create an appropriate warning message. The generated warning message becomes the output.

[0536] Step 6:

[0537] The server sends the generated warning message to the terminal. The warning message becomes input and is delivered to the terminal via network communication.

[0538] Step 7:

[0539] The device receives warning messages sent from the server and notifies the user. The warning message becomes input and is displayed as a pop-up message or push notification so that the user can easily check it. This gives the user an opportunity to reflect on and correct their statements.

[0540] (Application Example 1)

[0541] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0542] Preventing sexual harassment is a critical issue in the workplace and public spaces. Traditional methods often rely on post-incident responses, making real-time prevention difficult. Therefore, there is a need for a method that effectively suppresses potential sexual harassment by analyzing audio signals in real time and generating immediate warnings.

[0543] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0544] In this invention, the server includes data collection means, analysis means for analyzing and evaluating collected audio signals or text information, and transmission means for transmitting generated alerts to the user device. This allows for real-time analysis of words and expressions spoken by the user, and enables immediate warnings if there are potential problems.

[0545] A "data collection means" is a device that has the function of acquiring audio signals and text information from the user's voice or text input.

[0546] "Analysis means" refers to a device that has the function of analyzing collected audio signals or text information and evaluating its content.

[0547] A "comparison tool" is a tool that allows for the determination of whether or not a problem exists by comparing the analyzed information with a collection of cases related to sexual harassment.

[0548] An "alert generation mechanism" is a device that generates a warning when it detects a phrase that may contain a potential problem and immediately communicates it to the user.

[0549] A "communication means" is a device that has the function of transmitting generated warnings to the user's device and notifying the user in real time.

[0550] A "portable information device" refers to a compact electronic device that users carry with them on a daily basis and that has the function of collecting and processing voice and text data.

[0551] "Speech recognition technology" is a technology that converts speech signals into text information, enabling subsequent processing.

[0552] "Learning technology" refers to technology that improves and optimizes the system based on user responses, enabling the generation of more accurate warnings.

[0553] The system realizing this invention aims to prevent sexual harassment by utilizing portable information devices and servers to acquire and analyze voice signals and text information in real time. Its main components include data collection means, analysis means, comparison means, alert generation means, communication means, and learning technology.

[0554] The server receives audio signals or text information from the user's portable information device and first analyzes the collected data using speech recognition technology. The analyzed information is then compared with a collection of sexual harassment-related case studies. Using comparison tools, problematic phrases are identified, and an alert generation tool immediately generates a warning. The generated warning is sent to the user's device via a communication tool, notifying the user in real time. This gives the user an immediate opportunity to address potential problems with their statements.

[0555] As a concrete example, consider a scenario where a user is in an office meeting and unconsciously says, "You look great today." This system captures the statement, analyzes it immediately, and evaluates its potential for problematic behavior by comparing it to past examples. If a problem is identified, a warning message appears to the user stating, "This expression may be misinterpreted." In this way, the system identifies potential problems in real time and supports appropriate social behavior.

[0556] An example of a prompt for a generative AI model would be: "Use natural language processing to analyze the following statement and determine if it may constitute sexual harassment: 'You look lovely today.'"

[0557] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0558] Step 1:

[0559] The device uses its built-in microphone to capture the user's voice signal in real time. It acquires the voice signal as input and temporarily stores it as preparation for processing.

[0560] Step 2:

[0561] The device uses speech recognition technology to convert the captured audio signal into text. In this step, the audio signal is used as input and text information is generated as output. A speech recognition system (e.g., Google's API) is used to analyze the audio data and convert colloquial expressions into text.

[0562] Step 3:

[0563] The text data is sent to the server using a secure protocol. This is the process of preparing the server for analysis by passing the text data as input.

[0564] Step 4:

[0565] The server processes the received text data through a natural language processing engine to perform language analysis. It processes the text data as input and outputs the analysis results. The analysis includes examining grammar, context, and word meanings.

[0566] Step 5:

[0567] The server uses the analysis results to compare text data with a collection of sexual harassment case studies. The analysis results are fed into a matching mechanism as input, and a list of problematic phrases is obtained as output. This process uses database queries to identify expressions that match past cases.

[0568] Step 6:

[0569] Based on the matching results, the server immediately generates an alert if the problematic phrase is found. The generated alert uses the matching results as input and produces an alert message as output. The content of the alert is determined using an alert generation algorithm.

[0570] Step 7:

[0571] The server sends the generated warning to the terminal. This process uses the warning message as input to notify the user in real time. As a result, the warning is displayed on the terminal, prompting the user to pay attention.

[0572] Step 8:

[0573] Users review the warnings they receive and modify their statements and actions as needed. This allows users to consciously take socially appropriate actions based on the warnings.

[0574] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0575] This invention incorporates an emotion engine that analyzes the user's emotions into a system that monitors a user's voice or text data in real time, detects potentially sexual harassment phrases, and issues warnings. This system can analyze the user's emotional state through communication in real time and adjust the warning content according to that emotion.

[0576] Terminal processing

[0577] The terminal captures the user's voice and text data through data input means. The voice data is converted to text using speech recognition means, and the series of text data is analyzed by an emotion engine before being sent to the server for processing.

[0578] Analysis of the Emotion Engine

[0579] The emotion engine analyzes the user's voice tone and linguistic features in their text to estimate their current emotional state. For example, it categorizes the user into an emotion category such as tension, anger, or joy, and this information is sent to the server.

[0580] Server Processing

[0581] The server analyzes the received text data using a natural language processing engine and compares phrases related to sexual harassment with a case database using a matching mechanism. Taking into account emotional information obtained from an emotion engine, the warning generation mechanism generates the most appropriate warning based on the results.

[0582] Warning adjustment

[0583] The generated warnings are sent to the user's terminal via a notification system. The warnings are adjusted according to the user's emotional state; for example, if anger is detected, the warning will be in a calmer tone, and if tension is recognized, it will be in an encouraging tone.

[0584] User response and system learning

[0585] Users view warnings on their devices and provide feedback. This feedback is recorded on the server side and used to update the emotion engine and warning generation algorithms, thereby improving the system's accuracy.

[0586] Specific example

[0587] Suppose a user makes the comment, "You seem grumpy today," during a video conference. The device captures this audio and converts it into text. The emotion engine detects that the user is somewhat upset and sends the data to the server. The server determines, through a database comparison, that this comment constitutes potential sexual harassment and generates a warning. This warning is delivered to the user in an emotionally sensitive manner, such as, "Such comments can be misinterpreted. Let's try to be a little more considerate." Upon receiving this warning, the user can reflect on their words and actions carefully and strive to improve their communication.

[0588] The following describes the processing flow.

[0589] Step 1:

[0590] The device captures audio data from the user's microphone. Once voice input is complete, it uses speech recognition technology to convert this data into text format. This converted text data is stored in a buffer for later analysis.

[0591] Step 2:

[0592] The device estimates the user's emotions by analyzing the sound quality and characteristics of their voice. An emotion engine analyzes tone, pitch, speed, etc., and prepares the results along with text data.

[0593] Step 3:

[0594] The device sends the converted and analyzed data to the server. This data includes the user's text communications and their emotional state at the time.

[0595] Step 4:

[0596] The server analyzes the received text data using a natural language processing engine. Here, it determines whether the data contains phrases related to sexual harassment based on specific phrases or keywords.

[0597] Step 5:

[0598] The server adjusts the analysis results based on the user's emotional state information. Information from the emotion engine is included as a factor that influences the tone and specific content of the warning generation.

[0599] Step 6:

[0600] The server uses a matching mechanism to compare text data with a case database. If a pattern similar to sexual harassment is detected, a warning generation mechanism is activated.

[0601] Step 7:

[0602] The server generates customized alerts based on detected problems and emotional states. For example, if anger is detected, it will issue a warning in a calm tone.

[0603] Step 8:

[0604] Warnings sent from the server are notified to the user on the terminal. The terminal presents this notification to the user through a pop-up message or other interface.

[0605] Step 9:

[0606] Users review the warnings presented and provide feedback. By reviewing their own words and actions as needed, communication improves.

[0607] Step 10:

[0608] User feedback is sent from the device to the server and analyzed using a learning algorithm. This feedback is used to improve and adjust the system.

[0609] (Example 2)

[0610] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0611] Conventional systems have difficulty analyzing voice or text data in a timely manner, making it particularly challenging to detect and respond appropriately to sexual harassment-related remarks in real time. Furthermore, they lack the functionality to adjust warnings based on the user's emotional state, which hinders their ability to effectively facilitate communication.

[0612] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0613] In this invention, the server includes data acquisition means, analysis means for processing acquired voice or text information to analyze emotions, and comparison means for comparing with an information database related to sexual harassment. This enables the generation of warnings that accurately reflect the user's emotions in real time, and allows for a rapid and appropriate response to sexual harassment.

[0614] "Data acquisition means" refers to a device or software function for collecting voice or text information from a user.

[0615] "Analysis means" refers to a device or software function for processing acquired audio or text information and analyzing the user's emotional state.

[0616] "Comparison means" refers to a device or software function that compares analyzed information with a database containing information related to sexual harassment.

[0617] "Warning generation means" refers to a device or software function that generates a warning to notify the user based on the results of the comparison means.

[0618] "Transmission means" refers to a device or function that transmits the generated warning to the user's information terminal.

[0619] "Improvement measures" refer to devices or software functions that receive feedback and opinions from users to improve the system's performance and responsiveness.

[0620] This invention is a system that analyzes voice and text data in real time on an information terminal used by a user and detects phrases that may constitute sexual harassment by comparing them with a specific database. This system can improve communication by generating warnings that take into account the user's emotional state and adjusting the content of the warnings according to the user's emotions when notifying them.

[0621] The terminal captures user voice and text information using data acquisition means equipped with a microphone and text input device. The voice is converted into text data using speech recognition software (e.g., a speech recognition engine). This is the process of transcribing the user's utterances into text. The converted text data is then analyzed to evaluate the user's emotional state. Specifically, it is categorized into emotional categories such as tension or anger based on voice tone and language choices.

[0622] After receiving this analysis data, the server uses a natural language processing engine (e.g., a natural language processing AI model) to analyze the text data and compare it against a database to determine if it contains phrases related to sexual harassment. Here, the case database is a collection of similar phrases from the past and is used to quickly evaluate the user's statements.

[0623] The warning generation mechanism generates an appropriate warning based on the matching results and the user's emotional data. The generated warning is then sent to the user's device via the transmission mechanism. The warning content is adjusted according to the user's emotions; for example, if the user is showing signs of tension, the warning will be changed to something that will alleviate that tension.

[0624] Users can view this warning on their device and consciously adjust their actions. They can also provide feedback on the warning, which is recorded on the server and used to improve the system's performance through various enhancement methods.

[0625] For example, if a user says "You seem grumpy today" during a video chat, the device converts this audio into text, and the emotion engine determines the emotion to be "anger." This data is then analyzed on the server, and if the phrase is determined to have the potential to be sexual harassment, a warning is generated and sent, such as "Such remarks can be misleading. Try to be a little more considerate."

[0626] An example of a prompt for a generating AI model is: "Generate a scenario that monitors phrases used by users in workplace conversations, detects potentially sexually harassing expressions, and issues appropriate warnings. Consider that the warnings should be adjusted according to the user's real-time emotional state."

[0627] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0628] Step 1:

[0629] The device captures user voice and text information in real time through data acquisition means. Voice data is acquired via a microphone, and text information is entered via keyboard or touch input. The acquired voice data is converted to text by a speech recognition engine. Input is raw voice data, and output is text data.

[0630] Step 2:

[0631] The device sends the converted text data and audio analysis to an emotion analysis engine. Here, the tone of the audio and linguistic features in the text are analyzed to determine the user's emotional state. Specifically, the audio data is analyzed for tone, speed, and emphasis, while the frequency of positive / negative expressions in the text data is calculated. The input consists of audio tone and text data, and the output is metadata indicating the emotional state.

[0632] Step 3:

[0633] The device sends the analysis results (emotional state metadata) along with text data to the server.

[0634] Step 4:

[0635] The server feeds the received text data into a natural language processing engine, which compares it with a database to determine if it contains phrases related to sexual harassment. During this process, the natural language processing engine analyzes the context and structure of the text. The input is text data, and the output indicates whether or not suspicious phrases are present.

[0636] Step 5:

[0637] The server uses a warning generation mechanism to generate a warning message based on emotional state metadata and detected phrase information. The generated warning is adjusted according to the emotional state. For example, if the user is stressed, the message will be encouraging, and if they are angry, it will be calm. The input is the emotional state and phrase information, and the output is the adjusted warning message.

[0638] Step 6:

[0639] The server sends the generated warning message to the terminal, and the terminal notifies the user. The terminal uses notification methods to convey the warning to the user as a pop-up display or audio notification. The input is the warning message, and the output is the notification to the user.

[0640] Step 7:

[0641] Users review warnings and provide feedback on their devices. This feedback is sent digitally to a server, recorded and analyzed by system improvement tools, and used for future speech detection and warning generation. The input is user feedback, and the output is information for system learning and improvement.

[0642] (Application Example 2)

[0643] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0644] In workplace communication, detecting sexual harassment and inappropriate remarks in real time is difficult, and there is the challenge of needing to respond appropriately while considering the emotional state of the speaker. Furthermore, it is important to improve the accuracy of the system by taking user feedback into account.

[0645] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0646] In this invention, the server includes a data acquisition function, a processing function for processing and analyzing acquired voice or text information, and an emotion analysis function for analyzing emotions and adjusting warnings according to the emotional state. This makes it possible to detect inappropriate remarks in workplace communication in real time and respond appropriately according to the emotional state of the speaker.

[0647] The "data acquisition function" is a function that captures audio or text information and converts it into a format necessary for analysis within the system.

[0648] The "processing function" is a function that analyzes acquired audio or text information and uses that analysis to determine inappropriate expressions, etc.

[0649] The "comparison function" is a feature that compares the analyzed information with a database of examples of inappropriate expressions to check for any matches.

[0650] The "warning generation function" is a feature that creates appropriate warnings for users based on detected problematic expressions.

[0651] The "notification function" is a function that sends generated warnings to the user's device.

[0652] The "emotion analysis function" is a feature that analyzes the user's emotional state and selects and adjusts warnings according to the situation.

[0653] The "voice analysis function" is a function that acquires voice information and converts it into text.

[0654] The "learning function" is a feature that receives feedback from users and updates and improves the system's algorithms and database based on that feedback.

[0655] This system primarily consists of user terminals such as smartphones and smart glasses, and servers connected via communication.

[0656] The user's device is equipped with a data acquisition function that captures the user's voice in real time during conversations. The voice is converted into text through a voice analysis function. This converted text information is immediately analyzed by the device's processing functions.

[0657] The server includes a comparison function that matches processed text information against a database of cases related to inappropriate expressions. This database contains known inappropriate expressions and phrases related to sexual harassment.

[0658] Furthermore, the server-side has an emotion analysis function to analyze the user's emotional state. This function estimates the emotional state based on the user's voice tone and linguistic features contained in the text. By classifying the emotional state into categories such as tension, anger, and joy, the emotions behind the utterances are understood.

[0659] Based on this, the warning generation function is activated to create the most appropriate warning. The generated warning is sent to the user's device via the notification function. The content of the warning is adjusted according to the user's emotional state, and if inappropriate language is detected, the user is prompted to reconsider their statement.

[0660] Furthermore, it incorporates a learning function, allowing for improvements to the system's database and algorithms by incorporating user feedback. This process continuously improves the overall accuracy of the system.

[0661] For example, if someone says "You seem a little grumpy today" during a meeting, the system will detect this and, especially if the emotion analysis function identifies a tense situation, it will generate a warning prompting the use of more appropriate language.

[0662] An example of a prompt is: "Explain how to analyze user sentiment from conversations, detect inappropriate phrases, and generate warnings."

[0663] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0664] Step 1:

[0665] The user's device captures the audio. The audio is acquired via a data input function and converted into text data by a speech analysis function. This converted text becomes the input for the next process.

[0666] Step 2:

[0667] The terminal's processing function analyzes the converted text data. During this process, it extracts linguistic features from the text data and generates information to estimate the user's emotional state. The output consists of the analyzed text data and metadata for sentiment analysis.

[0668] Step 3:

[0669] The server activates a comparison function that compares the analyzed text data with the case database. It matches the text data against known inappropriate expressions in the database and detects any matches. This result becomes the input for the next process.

[0670] Step 4:

[0671] The server's sentiment analysis function estimates emotions from the user's voice tone and linguistic features in the text. The input is metadata and voice features from step 2, and the output is data indicating the estimated emotional state.

[0672] Step 5:

[0673] The server's warning generation function generates the most appropriate warning for the user based on the detection results of inappropriate language and the estimated emotional state. The input is the detection results of inappropriate language and the emotional analysis results, and the output is a tailored warning message.

[0674] Step 6:

[0675] The generated warning message is sent to the user's device via the notification function. The user receives the warning message and has an opportunity to reconsider their actions. The output is the warning message displayed to the user.

[0676] Step 7:

[0677] Users provide feedback, which is recorded by the learning function. The server uses this feedback to update its database and algorithms, improving the system's accuracy. The input is user feedback, and the output is the updated system settings and database.

[0678] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0679] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0680] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0681] [Fourth Embodiment]

[0682] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0683] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0684] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0685] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0686] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0687] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0688] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0689] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0690] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0691] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0692] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0693] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0694] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0695] This invention is a system aimed at preventing sexual harassment, and it primarily functions via user terminals, servers, and the communication network connecting them. This system is realized by combining various technologies such as speech recognition, natural language processing, database matching, warning generation, and user notifications.

[0696] Terminal processing

[0697] The device has the ability to capture the user's voice or text input in real time. On the device, speech recognition technology is used to convert the voice data into text, preparing it for processing as text data. This text data is sent to the server via a secure protocol.

[0698] Server Processing

[0699] The server analyzes the received text data using a natural language processing engine. The analyzed data is then compared against a database containing cases related to sexual harassment, and problematic phrases are identified. Based on the comparison results, the server generates a warning as needed. This warning includes the identified problematic expression and an explanation based on its context.

[0700] User notifications

[0701] The generated alerts are sent to the device and notified to the user. The device presents the alert to the user in the form of a pop-up message or other notification, giving the user an opportunity to reflect on their statements and messages. The user can then review the alert and correct their actions as needed.

[0702] Learning and Improvement

[0703] User feedback is sent to the server, and the system's algorithms are updated based on this feedback. The learning process on the server will improve the accuracy of future sexual harassment detection and enable more effective warnings.

[0704] Specific example

[0705] Suppose a user is having a video conference at work and makes a comment like, "You look cute today." The device captures this audio and sends it as text data to the server. The server compares this phrase to a database and determines that it may be similar to past cases of sexual harassment. The server then generates a warning and notifies the device. The user will see a warning on the screen that says, "This expression may be considered sexual harassment. Please be careful." In this way, the system identifies potential problems in real time and contributes to improving the workplace environment.

[0706] The following describes the processing flow.

[0707] Step 1:

[0708] The device captures audio data through the user's microphone input and also has the capability to collect text data from keyboard input. This prepares the device for real-time monitoring of user communication.

[0709] Step 2:

[0710] The device converts the captured audio data into text data using speech recognition technology. This converted text data is temporarily stored in a buffer for subsequent processing.

[0711] Step 3:

[0712] The terminal sends the converted text data to the server via a secure communication protocol. This data is treated as information necessary for detecting sexual harassment.

[0713] Step 4:

[0714] The server analyzes the received text data using a natural language processing engine. This analysis includes grammatical and semantic analysis, extracting important phrases and words.

[0715] Step 5:

[0716] The server compares the analyzed data with a database of sexual harassment-related cases. Here, it calculates the similarity to known cases in the database to determine if there is a potential problem.

[0717] Step 6:

[0718] The server generates an alert if it determines there is a problem. The alert includes the detected phrase, its context, and relevant advice.

[0719] Step 7:

[0720] The server sends the generated warning to the terminal. The terminal receives this warning and displays it in a user-friendly format.

[0721] Step 8:

[0722] Users view warnings through the screen on their device. Based on the warnings, they have the opportunity to review their communication and change their approach if necessary.

[0723] Step 9:

[0724] Users can send feedback on warnings to the server via their device as needed. This feedback is used as part of the system's continuous improvement and learning process.

[0725] Step 10:

[0726] The server analyzes the feedback received and updates the system's algorithms. This improves the accuracy of future sexual harassment detection.

[0727] (Example 1)

[0728] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0729] In today's workplace, remarks and communications that could lead to sexual harassment are sometimes overlooked, and there is a need for effective means to prevent this. In particular, it is important to improve the workplace environment and raise employer awareness by pointing out problematic expressions in real time and issuing immediate warnings to users.

[0730] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0731] In this invention, the server includes means for inputting voice or text information, means for processing and analyzing the input data, means for comparing it with a record storage device related to sexual harassment, means for detecting identified problematic expressions and generating notifications, and means for receiving user feedback and improving the system. This makes it possible to detect potentially problematic remarks in real time and immediately notify users, thereby deterring inappropriate communication in the workplace.

[0732] "Means for inputting voice or text information" refers to a device that has the function of acquiring voice or text from a user and supplying it to the system for processing.

[0733] "Means for processing and analyzing input data" refers to technologies that analyze acquired audio or text data and perform processing to understand its content.

[0734] A "record-keeping device related to sexual harassment" refers to data storage that accumulates past cases and standard data for use in comparison.

[0735] "Means for detecting identified problematic expressions and generating notifications" refers to a device or program that has the function of identifying inappropriate expressions in analyzed data and creating and sending warnings to the user.

[0736] "Means of receiving feedback from users to improve the system" refers to techniques that collect user feedback and incorporate it into learning algorithms to improve the system's recognition accuracy and functionality.

[0737] This system provides advanced monitoring and warning functions aimed at preventing sexual harassment. It primarily consists of user terminals, communication networks, and servers.

[0738] terminal

[0739] The device captures the user's spoken or typed audio in real time. For example, it collects audio data using a microphone built into a laptop or smartphone. This audio data is then converted into text using software such as a "speech recognition API" installed on the device.

[0740] server

[0741] The server receives text data sent from the terminal and analyzes it using natural language processing techniques. This process utilizes a "natural language processing library" and compares it with existing examples of sexual harassment in the database. Based on the analysis results, the server generates a warning. For example, it analyzes various problematic expressions and creates an appropriate warning message.

[0742] User

[0743] The user receives the warning sent from the server on their device. The device typically displays the warning as a pop-up message on the screen, but it may be presented in other ways depending on the situation. The user reviews the warning and has an opportunity to reconsider their expression.

[0744] As a concrete example, if a user says "Your outfit looks great today" during a video conference, the audio is converted to text and sent to the server. The server analyzes this statement and, if it determines there is a risk of sexual harassment, generates a warning and notifies the user on their device that "This expression may be perceived as inappropriate by the recipient." This system allows users to immediately review their own statements and contribute to improving the workplace communication environment.

[0745] By utilizing generative AI models, warnings can be made more precise and refined, and notifications can be delivered in a way that is relevant to the user's actual statements and context. An example of a prompt might be, "Please tell me about expressions that may be considered sexual harassment in workplace conversations."

[0746] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0747] Step 1:

[0748] The device acquires either user voice or text input. In the case of voice input, it captures the audio in real time using the built-in microphone. This voice data becomes the input. The voice data is then converted to text using a "speech recognition API" or similar. This converted text data becomes the output.

[0749] Step 2:

[0750] The terminal sends the text data output in Step 1 to the server via a secure protocol. This transmitted text data becomes the new input. Using HTTPS or similar protocols can prevent data tampering. The server then prepares this text data for the next step.

[0751] Step 3:

[0752] The server receives text data and performs analysis using natural language processing (NLP) techniques. This process takes text data as input, understands the meaning within the text, and performs data processing to break down the elements. A "natural language processing library" is used. The analyzed results are output as structured data.

[0753] Step 4:

[0754] The server compares the data output in step 3 with a database containing sexual harassment cases. It uses the analyzed data as input to perform data calculations that compare it with past cases. A flag is output indicating that similar expressions were found as a result of the comparison.

[0755] Step 5:

[0756] The server generates a warning if there is a problem based on the matching results. Using the matching flag as input, it utilizes a "generating AI model" to create an appropriate warning message. The generated warning message becomes the output.

[0757] Step 6:

[0758] The server sends the generated warning message to the terminal. The warning message becomes input and is delivered to the terminal via network communication.

[0759] Step 7:

[0760] The device receives warning messages sent from the server and notifies the user. The warning message becomes input and is displayed as a pop-up message or push notification so that the user can easily check it. This gives the user an opportunity to reflect on and correct their statements.

[0761] (Application Example 1)

[0762] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0763] Preventing sexual harassment is a critical issue in the workplace and public spaces. Traditional methods often rely on post-incident responses, making real-time prevention difficult. Therefore, there is a need for a method that effectively suppresses potential sexual harassment by analyzing audio signals in real time and generating immediate warnings.

[0764] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0765] In this invention, the server includes data collection means, analysis means for analyzing and evaluating collected audio signals or text information, and transmission means for transmitting generated alerts to the user device. This allows for real-time analysis of words and expressions spoken by the user, and enables immediate warnings if there are potential problems.

[0766] A "data collection means" is a device that has the function of acquiring audio signals and text information from the user's voice or text input.

[0767] "Analysis means" refers to a device that has the function of analyzing collected audio signals or text information and evaluating its content.

[0768] A "comparison tool" is a tool that allows for the determination of whether or not a problem exists by comparing the analyzed information with a collection of cases related to sexual harassment.

[0769] An "alert generation mechanism" is a device that generates a warning when it detects a phrase that may contain a potential problem and immediately communicates it to the user.

[0770] A "communication means" is a device that has the function of transmitting generated warnings to the user's device and notifying the user in real time.

[0771] A "portable information device" refers to a compact electronic device that users carry with them on a daily basis and that has the function of collecting and processing voice and text data.

[0772] "Speech recognition technology" is a technology that converts speech signals into text information, enabling subsequent processing.

[0773] "Learning technology" refers to technology that improves and optimizes the system based on user responses, enabling the generation of more accurate warnings.

[0774] The system realizing this invention aims to prevent sexual harassment by utilizing portable information devices and servers to acquire and analyze voice signals and text information in real time. Its main components include data collection means, analysis means, comparison means, alert generation means, communication means, and learning technology.

[0775] The server receives audio signals or text information from the user's portable information device and first analyzes the collected data using speech recognition technology. The analyzed information is then compared with a collection of sexual harassment-related case studies. Using comparison tools, problematic phrases are identified, and an alert generation tool immediately generates a warning. The generated warning is sent to the user's device via a communication tool, notifying the user in real time. This gives the user an immediate opportunity to address potential problems with their statements.

[0776] As a concrete example, consider a scenario where a user is in an office meeting and unconsciously says, "You look great today." This system captures the statement, analyzes it immediately, and evaluates its potential for problematic behavior by comparing it to past examples. If a problem is identified, a warning message appears to the user stating, "This expression may be misinterpreted." In this way, the system identifies potential problems in real time and supports appropriate social behavior.

[0777] An example of a prompt for a generative AI model would be: "Use natural language processing to analyze the following statement and determine if it may constitute sexual harassment: 'You look lovely today.'"

[0778] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0779] Step 1:

[0780] The device uses its built-in microphone to capture the user's voice signal in real time. It acquires the voice signal as input and temporarily stores it as preparation for processing.

[0781] Step 2:

[0782] The device uses speech recognition technology to convert the captured audio signal into text. In this step, the audio signal is used as input and text information is generated as output. A speech recognition system (e.g., Google's API) is used to analyze the audio data and convert colloquial expressions into text.

[0783] Step 3:

[0784] The text data is sent to the server using a secure protocol. This is the process of preparing the server for analysis by passing the text data as input.

[0785] Step 4:

[0786] The server processes the received text data through a natural language processing engine to perform language analysis. It processes the text data as input and outputs the analysis results. The analysis includes examining grammar, context, and word meanings.

[0787] Step 5:

[0788] The server uses the analysis results to compare text data with a collection of sexual harassment case studies. The analysis results are fed into a matching mechanism as input, and a list of problematic phrases is obtained as output. This process uses database queries to identify expressions that match past cases.

[0789] Step 6:

[0790] Based on the matching results, the server immediately generates an alert if the problematic phrase is found. The generated alert uses the matching results as input and produces an alert message as output. The content of the alert is determined using an alert generation algorithm.

[0791] Step 7:

[0792] The server sends the generated warning to the terminal. This process uses the warning message as input to notify the user in real time. As a result, the warning is displayed on the terminal, prompting the user to pay attention.

[0793] Step 8:

[0794] Users review the warnings they receive and modify their statements and actions as needed. This allows users to consciously take socially appropriate actions based on the warnings.

[0795] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0796] This invention incorporates an emotion engine that analyzes the user's emotions into a system that monitors a user's voice or text data in real time, detects potentially sexual harassment phrases, and issues warnings. This system can analyze the user's emotional state through communication in real time and adjust the warning content according to that emotion.

[0797] Terminal processing

[0798] The terminal captures the user's voice and text data through data input means. The voice data is converted to text using speech recognition means, and the series of text data is analyzed by an emotion engine before being sent to the server for processing.

[0799] Analysis of the Emotion Engine

[0800] The emotion engine analyzes the user's voice tone and linguistic features in their text to estimate their current emotional state. For example, it categorizes the user into an emotion category such as tension, anger, or joy, and this information is sent to the server.

[0801] Server Processing

[0802] The server analyzes the received text data using a natural language processing engine and compares phrases related to sexual harassment with a case database using a matching mechanism. Taking into account emotional information obtained from an emotion engine, the warning generation mechanism generates the most appropriate warning based on the results.

[0803] Warning adjustment

[0804] The generated warnings are sent to the user's terminal via a notification system. The warnings are adjusted according to the user's emotional state; for example, if anger is detected, the warning will be in a calmer tone, and if tension is recognized, it will be in an encouraging tone.

[0805] User response and system learning

[0806] Users view warnings on their devices and provide feedback. This feedback is recorded on the server side and used to update the emotion engine and warning generation algorithms, thereby improving the system's accuracy.

[0807] Specific example

[0808] Suppose a user makes the comment, "You seem grumpy today," during a video conference. The device captures this audio and converts it into text. The emotion engine detects that the user is somewhat upset and sends the data to the server. The server determines, through a database comparison, that this comment constitutes potential sexual harassment and generates a warning. This warning is delivered to the user in an emotionally sensitive manner, such as, "Such comments can be misinterpreted. Let's try to be a little more considerate." Upon receiving this warning, the user can reflect on their words and actions carefully and strive to improve their communication.

[0809] The following describes the processing flow.

[0810] Step 1:

[0811] The device captures audio data from the user's microphone. Once voice input is complete, it uses speech recognition technology to convert this data into text format. This converted text data is stored in a buffer for later analysis.

[0812] Step 2:

[0813] The device estimates the user's emotions by analyzing the sound quality and characteristics of their voice. An emotion engine analyzes tone, pitch, speed, etc., and prepares the results along with text data.

[0814] Step 3:

[0815] The device sends the converted and analyzed data to the server. This data includes the user's text communications and their emotional state at the time.

[0816] Step 4:

[0817] The server analyzes the received text data using a natural language processing engine. Here, it determines whether the data contains phrases related to sexual harassment based on specific phrases or keywords.

[0818] Step 5:

[0819] The server adjusts the analysis results based on the user's emotional state information. Information from the emotion engine is included as a factor that influences the tone and specific content of the warning generation.

[0820] Step 6:

[0821] The server uses a matching mechanism to compare text data with a case database. If a pattern similar to sexual harassment is detected, a warning generation mechanism is activated.

[0822] Step 7:

[0823] The server generates customized alerts based on detected problems and emotional states. For example, if anger is detected, it will issue a warning in a calm tone.

[0824] Step 8:

[0825] Warnings sent from the server are notified to the user on the terminal. The terminal presents this notification to the user through a pop-up message or other interface.

[0826] Step 9:

[0827] Users review the warnings presented and provide feedback. By reviewing their own words and actions as needed, communication improves.

[0828] Step 10:

[0829] User feedback is sent from the device to the server and analyzed using a learning algorithm. This feedback is used to improve and adjust the system.

[0830] (Example 2)

[0831] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0832] Conventional systems have difficulty analyzing voice or text data in a timely manner, making it particularly challenging to detect and respond appropriately to sexual harassment-related remarks in real time. Furthermore, they lack the functionality to adjust warnings based on the user's emotional state, which hinders their ability to effectively facilitate communication.

[0833] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0834] In this invention, the server includes data acquisition means, analysis means for processing acquired voice or text information to analyze emotions, and comparison means for comparing with an information database related to sexual harassment. This enables the generation of warnings that accurately reflect the user's emotions in real time, and allows for a rapid and appropriate response to sexual harassment.

[0835] "Data acquisition means" refers to a device or software function for collecting voice or text information from a user.

[0836] "Analysis means" refers to a device or software function for processing acquired audio or text information and analyzing the user's emotional state.

[0837] "Comparison means" refers to a device or software function that compares analyzed information with a database containing information related to sexual harassment.

[0838] "Warning generation means" refers to a device or software function that generates a warning to notify the user based on the results of the comparison means.

[0839] "Transmission means" refers to a device or function that transmits the generated warning to the user's information terminal.

[0840] "Improvement measures" refer to devices or software functions that receive feedback and opinions from users to improve the system's performance and responsiveness.

[0841] This invention is a system that analyzes voice and text data in real time on an information terminal used by a user and detects phrases that may constitute sexual harassment by comparing them with a specific database. This system can improve communication by generating warnings that take into account the user's emotional state and adjusting the content of the warnings according to the user's emotions when notifying them.

[0842] The terminal captures user voice and text information using data acquisition means equipped with a microphone and text input device. The voice is converted into text data using speech recognition software (e.g., a speech recognition engine). This is the process of transcribing the user's utterances into text. The converted text data is then analyzed to evaluate the user's emotional state. Specifically, it is categorized into emotional categories such as tension or anger based on voice tone and language choices.

[0843] After receiving this analysis data, the server uses a natural language processing engine (e.g., a natural language processing AI model) to analyze the text data and compare it against a database to determine if it contains phrases related to sexual harassment. Here, the case database is a collection of similar phrases from the past and is used to quickly evaluate the user's statements.

[0844] The warning generation mechanism generates an appropriate warning based on the matching results and the user's emotional data. The generated warning is then sent to the user's device via the transmission mechanism. The warning content is adjusted according to the user's emotions; for example, if the user is showing signs of tension, the warning will be changed to something that will alleviate that tension.

[0845] Users can view this warning on their device and consciously adjust their actions. They can also provide feedback on the warning, which is recorded on the server and used to improve the system's performance through various enhancement methods.

[0846] For example, if a user says "You seem grumpy today" during a video chat, the device converts this audio into text, and the emotion engine determines the emotion to be "anger." This data is then analyzed on the server, and if the phrase is determined to have the potential to be sexual harassment, a warning is generated and sent, such as "Such remarks can be misleading. Try to be a little more considerate."

[0847] An example of a prompt for a generating AI model is: "Generate a scenario that monitors phrases used by users in workplace conversations, detects potentially sexually harassing expressions, and issues appropriate warnings. Consider that the warnings should be adjusted according to the user's real-time emotional state."

[0848] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0849] Step 1:

[0850] The device captures user voice and text information in real time through data acquisition means. Voice data is acquired via a microphone, and text information is entered via keyboard or touch input. The acquired voice data is converted to text by a speech recognition engine. Input is raw voice data, and output is text data.

[0851] Step 2:

[0852] The device sends the converted text data and audio analysis to an emotion analysis engine. Here, the tone of the audio and linguistic features in the text are analyzed to determine the user's emotional state. Specifically, the audio data is analyzed for tone, speed, and emphasis, while the frequency of positive / negative expressions in the text data is calculated. The input consists of audio tone and text data, and the output is metadata indicating the emotional state.

[0853] Step 3:

[0854] The device sends the analysis results (emotional state metadata) along with text data to the server.

[0855] Step 4:

[0856] The server feeds the received text data into a natural language processing engine, which compares it with a database to determine if it contains phrases related to sexual harassment. During this process, the natural language processing engine analyzes the context and structure of the text. The input is text data, and the output indicates whether or not suspicious phrases are present.

[0857] Step 5:

[0858] The server uses a warning generation mechanism to generate a warning message based on emotional state metadata and detected phrase information. The generated warning is adjusted according to the emotional state. For example, if the user is stressed, the message will be encouraging, and if they are angry, it will be calm. The input is the emotional state and phrase information, and the output is the adjusted warning message.

[0859] Step 6:

[0860] The server sends the generated warning message to the terminal, and the terminal notifies the user. The terminal uses notification methods to convey the warning to the user as a pop-up display or audio notification. The input is the warning message, and the output is the notification to the user.

[0861] Step 7:

[0862] Users review warnings and provide feedback on their devices. This feedback is sent digitally to a server, recorded and analyzed by system improvement tools, and used for future speech detection and warning generation. The input is user feedback, and the output is information for system learning and improvement.

[0863] (Application Example 2)

[0864] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0865] In workplace communication, detecting sexual harassment and inappropriate remarks in real time is difficult, and there is the challenge of needing to respond appropriately while considering the emotional state of the speaker. Furthermore, it is important to improve the accuracy of the system by taking user feedback into account.

[0866] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0867] In this invention, the server includes a data acquisition function, a processing function for processing and analyzing acquired voice or text information, and an emotion analysis function for analyzing emotions and adjusting warnings according to the emotional state. This makes it possible to detect inappropriate remarks in workplace communication in real time and respond appropriately according to the emotional state of the speaker.

[0868] The "data acquisition function" is a function that captures audio or text information and converts it into a format necessary for analysis within the system.

[0869] The "processing function" is a function that analyzes acquired audio or text information and uses that analysis to determine inappropriate expressions, etc.

[0870] The "comparison function" is a feature that compares the analyzed information with a database of examples of inappropriate expressions to check for any matches.

[0871] The "warning generation function" is a feature that creates appropriate warnings for users based on detected problematic expressions.

[0872] The "notification function" is a function that sends generated warnings to the user's device.

[0873] The "emotion analysis function" is a feature that analyzes the user's emotional state and selects and adjusts warnings according to the situation.

[0874] The "voice analysis function" is a function that acquires voice information and converts it into text.

[0875] The "learning function" is a feature that receives feedback from users and updates and improves the system's algorithms and database based on that feedback.

[0876] This system primarily consists of user terminals such as smartphones and smart glasses, and servers connected via communication.

[0877] The user's device is equipped with a data acquisition function that captures the user's voice in real time during conversations. The voice is converted into text through a voice analysis function. This converted text information is immediately analyzed by the device's processing functions.

[0878] The server includes a comparison function that matches processed text information against a database of cases related to inappropriate expressions. This database contains known inappropriate expressions and phrases related to sexual harassment.

[0879] Furthermore, the server-side has an emotion analysis function to analyze the user's emotional state. This function estimates the emotional state based on the user's voice tone and linguistic features contained in the text. By classifying the emotional state into categories such as tension, anger, and joy, the emotions behind the utterances are understood.

[0880] Based on this, the warning generation function is activated to create the most appropriate warning. The generated warning is sent to the user's device via the notification function. The content of the warning is adjusted according to the user's emotional state, and if inappropriate language is detected, the user is prompted to reconsider their statement.

[0881] Furthermore, it incorporates a learning function, allowing for improvements to the system's database and algorithms by incorporating user feedback. This process continuously improves the overall accuracy of the system.

[0882] For example, if someone says "You seem a little grumpy today" during a meeting, the system will detect this and, especially if the emotion analysis function identifies a tense situation, it will generate a warning prompting the use of more appropriate language.

[0883] An example of a prompt is: "Explain how to analyze user sentiment from conversations, detect inappropriate phrases, and generate warnings."

[0884] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0885] Step 1:

[0886] The user's device captures the audio. The audio is acquired via a data input function and converted into text data by a speech analysis function. This converted text becomes the input for the next process.

[0887] Step 2:

[0888] The terminal's processing function analyzes the converted text data. During this process, it extracts linguistic features from the text data and generates information to estimate the user's emotional state. The output consists of the analyzed text data and metadata for sentiment analysis.

[0889] Step 3:

[0890] The server activates a comparison function that compares the analyzed text data with the case database. It matches the text data against known inappropriate expressions in the database and detects any matches. This result becomes the input for the next process.

[0891] Step 4:

[0892] The server's sentiment analysis function estimates emotions from the user's voice tone and linguistic features in the text. The input is metadata and voice features from step 2, and the output is data indicating the estimated emotional state.

[0893] Step 5:

[0894] The server's warning generation function generates the most appropriate warning for the user based on the detection results of inappropriate language and the estimated emotional state. The input is the detection results of inappropriate language and the emotional analysis results, and the output is a tailored warning message.

[0895] Step 6:

[0896] The generated warning message is sent to the user's device via the notification function. The user receives the warning message and has an opportunity to reconsider their actions. The output is the warning message displayed to the user.

[0897] Step 7:

[0898] Users provide feedback, which is recorded by the learning function. The server uses this feedback to update its database and algorithms, improving the system's accuracy. The input is user feedback, and the output is the updated system settings and database.

[0899] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0900] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0901] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0902] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0903] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0904] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0905] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0906] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0907] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0908] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0909] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0910] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0911] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0912] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0913] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0914] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0915] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0916] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0917] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0918] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0919] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0920] The following is further disclosed regarding the embodiments described above.

[0921] (Claim 1)

[0922] Data input means,

[0923] A processing means for processing and analyzing input audio or text data,

[0924] A matching mechanism for cross-referencing with a database of cases related to sexual harassment,

[0925] A warning generation means that detects problematic phrases and generates a warning,

[0926] A notification method for notifying the user's terminal of a warning,

[0927] A system that includes this.

[0928] (Claim 2)

[0929] The system according to claim 1, wherein the data input means includes speech recognition means for converting speech data into text format.

[0930] (Claim 3)

[0931] The system according to claim 1, wherein the warning generation means includes a learning means that receives feedback from a user and updates the system.

[0932] "Example 1"

[0933] (Claim 1)

[0934] A means of inputting voice or text information,

[0935] A means for processing and analyzing the input data,

[0936] A means of cross-referencing with a record-keeping device related to sexual harassment,

[0937] A means for detecting identified problem expressions and generating notifications,

[0938] Means for transmitting notifications to user devices,

[0939] A means of improving the system by receiving feedback from users,

[0940] A system that includes this.

[0941] (Claim 2)

[0942] The system according to claim 1, wherein the means for inputting the voice or text information includes a speech recognition device that converts the acoustic data into text format.

[0943] (Claim 3)

[0944] The system according to claim 1, wherein the means for generating the notification includes means for analyzing the information using a machine learning algorithm when the generated notification is presented.

[0945] "Application Example 1"

[0946] (Claim 1)

[0947] Data collection means,

[0948] An analysis means for analyzing and evaluating collected audio signals or text information,

[0949] A comparative tool for comparing case studies and information related to sexual harassment,

[0950] An alert generation means that identifies phrases with potential problems and generates alerts,

[0951] A means for transmitting the generated alert to the user device,

[0952] A method using a portable information device capable of real-time processing,

[0953] A system that includes this.

[0954] (Claim 2)

[0955] The system according to claim 1, wherein the data collection means includes speech recognition technology that converts speech signals into text information.

[0956] (Claim 3)

[0957] The system according to claim 1, wherein the alert generation means includes a learning technique that receives a response from a user and optimizes the system.

[0958] "Example 2 of combining an emotion engine"

[0959] (Claim 1)

[0960] Data acquisition method,

[0961] An analysis means for processing acquired audio or text information to analyze emotions,

[0962] A means of comparison for comparing information databases related to sexual harassment,

[0963] A warning generation means that detects the relevant phrase and generates a warning,

[0964] A means for sending a warning to the user's information terminal,

[0965] A system that includes this.

[0966] (Claim 2)

[0967] The system according to claim 1, wherein the data acquisition means includes speech recognition means for converting speech information into text format.

[0968] (Claim 3)

[0969] The system according to claim 1, wherein the warning generation means includes an improvement means for receiving user feedback and updating the system.

[0970] "Application example 2 when combining with an emotional engine"

[0971] (Claim 1)

[0972] Function to acquire data,

[0973] A processing function that processes and analyzes acquired audio or text information,

[0974] A comparison function that compares with a database of examples related to inappropriate expressions,

[0975] A warning generation function that detects problematic expressions and generates warnings,

[0976] A notification function that alerts the user's device,

[0977] An emotion analysis function that analyzes emotions and adjusts warnings according to the emotional state,

[0978] A system that includes this.

[0979] (Claim 2)

[0980] The system according to claim 1, wherein the data acquisition function includes a speech analysis function that converts speech information into text.

[0981] (Claim 3)

[0982] The system according to claim 1, wherein the warning generation function includes a learning function that receives a response from a user and updates the system. [Explanation of Symbols]

[0983] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. Data collection means, An analysis means for analyzing and evaluating collected audio signals or text information, A comparative tool for comparing case studies and information related to sexual harassment, An alert generation means that identifies phrases with potential problems and generates alerts, A means for transmitting the generated alert to the user device, A method using a portable information device capable of real-time processing, A system that includes this.

2. The system according to claim 1, wherein the data collection means includes speech recognition technology that converts speech signals into text information.

3. The system according to claim 1, wherein the alert generation means includes a learning technique that receives a response from a user and optimizes the system.