system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system addresses subjective interview evaluations by using audio/text data and emotion recognition to generate objective scores and reports, enhancing the fairness and efficiency of hiring processes.

JP2026105339APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-16
Publication Date: 2026-06-26

Application Information

Patent Timeline

16 Dec 2024

Application

26 Jun 2026

Publication

JP2026105339A

IPC: G16H10/20; G16H10/00; G06Q50/10; G16H15/00; G16H50/30; G16H50/00; G06F3/01; G16H20/70; G16H50/20

AI Tagging

Technology Topics

Response generation Engineering

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Modern employment interviews rely heavily on subjective evaluation by interviewers, leading to biased and inefficient selection of human resources due to the lack of objectivity and consistency in assessment.

Method used

A system that utilizes audio or text data collection, natural language processing, and emotion recognition to generate objective evaluation scores and reports, incorporating visual elements for intuitive understanding, thereby enhancing fairness and efficiency.

Benefits of technology

The system provides objective and efficient evaluation by minimizing bias, improving the quality and fairness of hiring decisions through detailed scoring and emotional analysis.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026105339000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] A means of managing interview records by receiving and storing audio or text data, The means for analyzing the received data and generating a score based on a specific evaluation index, A means for creating an interview evaluation report based on the generated score and its basis, The aforementioned evaluation report is provided to the interviewer as a means to support their final decision, A system that includes means for collecting and analyzing user responses in real time through interaction with home automated devices, generating scores, and providing feedback.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern employment interviews, the evaluation by interviewers tends to depend on subjectivity, and unconscious biases intervene, making it difficult to have a fair and uniform evaluation of human resources. In addition, the large amount of data accumulated by interviewers complicates the evaluation work and hinders efficient selection of human resources. To solve these problems, it is required to improve objectivity in the interview process and enhance the efficiency of evaluation.

Means for Solving the Problems

[0005] This invention utilizes means for receiving and storing audio or text data to centrally manage interview records. By analyzing the received data, it generates scores based on specific evaluation metrics, enabling objective evaluation free from bias. Furthermore, it creates an evaluation report based on the generated scores and their rationale, and provides this to the interviewer, supporting final decision-making and streamlining the process. In addition, by adding visual information such as graphs to the evaluation report, the aim is to enable interviewers to intuitively understand the evaluation results.

[0006] "Audio or text data" refers to digital information that records the content of the interview conversation and is subject to analysis.

[0007] "Means for managing interview records" refers to a system or method for storing, organizing, and easily searching for and referencing data collected during an interview.

[0008] "Means of analysis and generating scores based on specific evaluation metrics" refers to a system or method that analyzes collected data and calculates a numerical evaluation based on pre-defined evaluation criteria.

[0009] "Means of generating an evaluation report" refers to a system or method for documenting interview results, based on the generated scores and their rationale, and providing them to the interviewer in visual or text format.

[0010] "Means to support the final decision" refers to a system or method for presenting evaluation reports to interviewers and supporting the decision-making process.

[0011] The "function to generate bias-free scores by detecting keywords and phrases" refers to a function that uses natural language processing technology to identify specific words and expressions and calculate an objective score with minimal bias in the evaluation.

[0012] "Means of enabling interviewers to easily understand evaluation results, including visual information" refers to a system or method that visualizes evaluation results using graphs, charts, etc., to facilitate intuitive understanding for interviewers. [Brief explanation of the drawing]

[0013] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, when an emotion engine is combined. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.

Mode for Carrying Out the Invention

[0014] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0019] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] This invention relates to a system that streamlines the evaluation process in job interviews and promotes fair judgment. Specific embodiments thereof are described below.

[0035] Data reception and record management

[0036] The device collects audio data in real time during the interview. This data is converted to text using speech recognition technology, sent to a server, and stored. The server organizes this data as an interview record and manages it for later analysis.

[0037] Data analysis and scoring

[0038] The server analyzes the received text data using natural language processing (NLP) technology. This detects keywords and phrases used during the interview and generates scores for each item based on pre-defined evaluation metrics. Scoring is performed objectively using an AI model, eliminating biases that arise from traditional subjective evaluations.

[0039] Generation and provision of evaluation reports

[0040] Based on the scoring results, the server generates an evaluation report. The report includes scores for each evaluation metric and their rationale, explained with visual elements (e.g., graphs, charts). The generated report is provided to the interviewer via a terminal, providing specific and objective information to assist in the final pass / fail decision.

[0041] Specific example

[0042] For example, if a candidate is asked a question about their "problem-solving ability" during an interview, the server identifies relevant keywords in their answer (e.g., "problem solving," "analysis," "implementation plan," etc.) and analyzes their score based on their frequency and context. The resulting report would then present to the user, for instance, "Problem-solving ability: 7 / 10, analytical skills are highly rated, but there is a lack of concrete examples." This allows the user to make a fair evaluation based on detailed analysis.

[0043] The system of this invention reduces the burden of many evaluation tasks faced by interviewers and contributes to improving the quality and fairness of hiring decisions.

[0044] The following describes the processing flow.

[0045] Step 1:

[0046] The terminal prepares for the interview and activates the device for recording audio data in real time. During this process, the recording environment is checked to ensure that the audio is captured accurately.

[0047] Step 2:

[0048] The device collects audio data and converts it into text data using speech recognition technology. During this conversion process, noise reduction and turn-taking are automatically performed to ensure that the text is transcribed in the appropriate context.

[0049] Step 3:

[0050] The server receives the converted text data and securely stores it in a database. This stored data will be used as a resource for later analysis.

[0051] Step 4:

[0052] The server analyzes stored text data using a natural language processing (NLP) engine. It extracts keywords and phrases related to specific evaluation metrics and performs contextual analysis based on the content of the utterances.

[0053] Step 5:

[0054] The server calculates a score corresponding to each evaluation metric based on the analysis results. The scores are generated according to evaluation criteria pre-trained by the AI model and are expressed as objective numerical values.

[0055] Step 6:

[0056] The server generates a detailed evaluation report based on the scoring results. This report includes the score for each metric, the reasoning behind it, and graphs and charts to make it easier to understand visually.

[0057] Step 7:

[0058] The device displays an evaluation report to the user acting as the interviewer. The user uses this report to gather information for making a final hiring decision.

[0059] Step 8:

[0060] The user analyzes the report content and makes a final evaluation of the candidate. If necessary, they examine specific items in detail and make a final hiring decision.

[0061] (Example 1)

[0062] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0063] Traditional interview evaluation processes rely on the subjective opinions of interviewers, potentially leading to bias and a lack of fairness. Furthermore, the time and effort required for evaluation are considerable, making them inefficient. Additionally, evaluation criteria often lack consistency, resulting in differing judgments from interviewer to interviewer. Addressing these issues and ensuring fairness and consistency in evaluations is essential.

[0064] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0065] In this invention, the server includes means for collecting audio data using an acoustic input device and converting it into text data using speech recognition technology; means for storing the converted text data in a secure storage device; and means for extracting keywords and phrases from the text data using natural language processing technology and scoring them based on evaluation criteria. This makes it possible to conduct objective interview evaluations with minimal bias and to improve the efficiency of the evaluation process.

[0066] An "acoustic input device" is a hardware device for collecting audio signals and is used to effectively capture what is said during an interview.

[0067] "Speech recognition technology" is a technology for converting speech data into text data, and is a method for processing human speech into a format that machines can understand.

[0068] "Text data" refers to string information that electronically records human speech, converted using speech recognition technology.

[0069] "Natural language processing technology" refers to a series of techniques that enable computers to understand and analyze human language and extract its meaning.

[0070] "Keywords and phrases" are important words or phrases used in the interview that are extracted as information related to the evaluation criteria.

[0071] "Evaluation criteria" refers to the indicators or conditions set in advance to be used as criteria for making evaluations during an interview.

[0072] "Scoring" is the process of analyzing text data to quantitatively evaluate each item based on evaluation criteria and providing a numerical evaluation.

[0073] A "generative AI model" is an algorithm that uses artificial intelligence technology to analyze and predict data, generating evaluation results based on that data.

[0074] "Visual elements" refer to graphical representations such as graphs and charts used within evaluation reports, and are elements used to make information easier to understand intuitively.

[0075] An "information terminal" is a digital device used by users to view evaluation reports, and includes electronic devices such as personal computers and tablets.

[0076] This invention relates to a system that automates the evaluation process of voice interviews, enabling fair and efficient evaluation. Specifically, it begins with collecting interview audio in real time using an acoustic input device. The terminal converts the collected audio into text data using speech recognition technology (e.g., a general speech recognition API) and sends the conversion result to a server.

[0077] The server stores text data in a secure storage device and simultaneously analyzes the data using natural language processing (NLP) techniques. A common NLP library is used as a specific example of the natural language processing techniques employed here. Important keywords and phrases are extracted from the text data, and scoring is performed based on these keywords according to pre-defined evaluation criteria. A generative AI model (e.g., a general language model) is used in the scoring process.

[0078] The scoring results are compiled into an evaluation report by the server. This report includes various visual elements (e.g., graphs, charts) and is provided to the interviewer via their information terminal. This report allows the user to make a final decision more easily and objectively.

[0079] As a concrete example, let's assume a candidate is asked questions related to "problem-solving ability" during an interview and responds to those questions. In this case, the server analyzes text data containing keywords such as "problem solving" and "implementation plan," and uses a generative AI model to derive an objective score. This result is then presented as an evaluation report.

[0080] An example of a prompt for a generative AI model is: "Based on the following interview responses, please evaluate the candidate's problem-solving ability and the reasoning behind it. 'What candidate said...'"

[0081] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0082] Step 1:

[0083] The device collects audio data in real time during the interview. It uses a microphone or other audio input device to record the conversation between the interviewer and the candidate. The input is audio data, which is then converted into text data using speech recognition technology. The converted text data is obtained as output and sent to the next step.

[0084] Step 2:

[0085] The terminal sends text data to the server. The server stores the received text data in a database. Metadata such as the interview date and time and candidate information are also recorded. The input is text data converted by speech recognition, and the output is the data stored and organized on the server.

[0086] Step 3:

[0087] The server uses natural language processing (NLP) techniques to analyze stored text data. It extracts keywords and phrases from the text using an NLP library. The input is stored text data, and the output contains keywords and related information necessary for evaluation.

[0088] Step 4:

[0089] The server uses a generative AI model to score extracted keywords based on evaluation criteria. The AI model understands the context of the text data and generates a score for each item. The input consists of analyzed keywords and evaluation criteria, and an objective score is output.

[0090] Step 5:

[0091] The server generates an evaluation report based on the scoring results. The score is represented using graphs and charts, including visual elements. The input is scored data, and a visually represented evaluation report is output.

[0092] Step 6:

[0093] The server sends the generated evaluation report to the terminal, providing it to the interviewer (the user). The user can then intuitively review the evaluation results using the information terminal. The input is the generated evaluation report, and the output is a report in a viewable format that can be used by the user.

[0094] (Application Example 1)

[0095] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0096] Traditional interview practice and evaluation methods tend to rely on subjective judgment, making it difficult to fairly and objectively assess a candidate's true abilities. This can lead to inadequate talent evaluation, potentially resulting in the selection of unsuitable candidates or the loss of talented individuals. Another challenge is the lack of support tools to help individuals effectively practice interview skills at home.

[0097] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0098] In this invention, the server includes means for managing interview records by receiving and storing voice or text data; means for analyzing the received data and generating a score based on specific evaluation metrics; and means for collecting and analyzing user responses in real time through interaction with home automated devices, generating a score, and providing feedback. This enables objective and fair interview practice in a home environment, allowing users to effectively improve their skills.

[0099] "Speech or text data" refers to a unit of information used to record human conversation or written information in a digital format.

[0100] "Means for managing records" refers to methods and devices for systematically organizing received data and keeping it accessible as needed.

[0101] "Means for analyzing data and generating scores based on specific evaluation metrics" refers to methods or devices for processing data and calculating evaluation values based on pre-set criteria.

[0102] "Means for creating an evaluation report" refers to methods or devices that document evaluation results based on the generated scores and their rationale, and present them in a visually communicable format.

[0103] "Household automated devices" are computer-controlled devices used in the home that have the ability to interact with people through voice and actions.

[0104] "Means of providing feedback" refers to methods or devices for communicating reactions and suggestions to users based on analysis results regarding their actions and responses.

[0105] To realize this system, a combination of hardware used as a home automation device and software for speech recognition and natural language processing is required. The server processes the received audio data in real time and saves it as text using speech recognition technology. Specifically, it uses Google® Speech-to-Text API or a similar speech recognition system to accurately convert speech into text.

[0106] Next, the server analyzes the stored text data using natural language processing techniques. Using libraries such as Python's NLTK and spaCy, it detects keywords and phrases related to evaluation metrics from the text and generates a score based on them. This scoring utilizes a pre-trained AI model (e.g., using TENSORFLOW® or PyTorch). The evaluation is specific and objective, and feedback is provided to the user.

[0107] Users can interact with home-based automated devices to create a simulated interview environment. For example, the device can ask questions such as, "Tell me about your leadership experience," and the user can provide voice prompts to answer. The system analyzes the answers in real time and provides voice feedback, such as whether the details of the leadership are specific or insufficient. This allows users to identify areas for improvement in their answers and efficiently practice interviews at home.

[0108] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0109] Step 1:

[0110] The user sets interview questions for a home automated device. The user inputs the questions using an interface. The entered questions are sent to the server in text format. The server prepares the received text data to be passed directly to the speech recognition module.

[0111] Step 2:

[0112] A home-use automated device uses speech synthesis technology to output question text received from a server. This allows the user to hear the questions presented by the device in real time. This audio output is generated based on the questions prepared by the user.

[0113] Step 3:

[0114] The user provides their response via voice. The device collects this voice data in real time and sends it to the server. The server uses speech recognition technology to convert it into text. As a result, the voice data is saved in text format, and this text becomes the input for subsequent data analysis.

[0115] Step 4:

[0116] The server analyzes the converted text data using natural language processing techniques. It detects specific keywords and phrases using Python's NLTK and spaCy. This structures the content of the responses, and the analysis results are prepared as input for the next stage of scoring.

[0117] Step 5:

[0118] The server uses a pre-trained generative AI model to generate a score based on the analysis results. The AI model scores based on the frequency and context of the input keywords. The generated score serves as the basis for feedback provided to the user.

[0119] Step 6:

[0120] The server generates an evaluation report based on the generated score and analysis results. In addition to the score, the report includes specific areas for improvement and strengths. Furthermore, feedback for voice output is generated and sent to the home automated device.

[0121] Step 7:

[0122] The home automated device uses speech synthesis technology to output feedback received from the server. This allows users to receive real-time evaluations of their responses, which can then be used to improve their performance. One example of such a prompt is, "Please tell me specifically about the challenges you faced."

[0123] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0124] This invention relates to an evaluation system that combines an emotion engine, which recognizes the user's emotions, with the interview process. This system analyzes the user's emotional state and reflects that information in the interview evaluation, enabling a more comprehensive and fair evaluation.

[0125] Data collection and emotion recognition

[0126] The device collects audio and video data during the interview in real time and sends it to an emotion recognition engine. This engine analyzes the user's emotions in real time based on information such as voice tone, facial expression changes, and word choice.

[0127] Analysis of emotional data

[0128] The server receives analysis results sent from the emotion recognition engine. This includes fluctuations in the user's emotions during the interview and their emotional responses to specific utterances. This data is integrated with other evaluation metrics within the interview evaluation system.

[0129] Scoring and report generation

[0130] The server adds the sentiment analysis results to existing evaluation metrics to generate an overall score. This score is based on objective information and complements the evaluation of emotional aspects that interviewers may find difficult to perceive. The server then creates an evaluation report based on this score and the analysis results.

[0131] Reports and decision support

[0132] The device provides the interviewer with an evaluation report. The report visually displays the sentiment analysis results, highlighting changes in emotions and key reactions during the interview. This additional information allows the user to conduct an evaluation that takes into account the candidate's emotional perception and personality.

[0133] Specific example

[0134] In an interview, if a question is asked about how the candidate handled a challenging situation, the sentiment analysis results received by the server indicate that the candidate used positive facial expressions and positive language in response. This allows the user to infer that the candidate can handle stressful situations calmly and reflect this in their evaluation.

[0135] Thus, the present invention enhances the quality of evaluation and promotes a more multifaceted understanding of candidates by incorporating sentiment data into conventional scoring methods.

[0136] The following describes the processing flow.

[0137] Step 1:

[0138] The device begins recording video and audio data simultaneously with the start of the interview. This collects digital data from the entire interview, ensuring that the information necessary for emotion recognition is secured.

[0139] Step 2:

[0140] The device sends collected audio data to an emotion recognition engine in real time, analyzing changes in voice tone and responses to specific emotions. During the speech recognition process, parameters such as volume, pitch, and speaking speed are analyzed.

[0141] Step 3:

[0142] The device uses video data to perform facial analysis and transmits changes in facial expressions and body movements to the emotion recognition engine. In this process, facial elements such as smiles, eyebrow movements, and gaze are analyzed.

[0143] Step 4:

[0144] The server processes the audio and facial expression analysis results received from the emotion recognition engine to organize the user's emotional fluctuations. Using this data, an emotion score is calculated to identify emotional trends throughout the interview.

[0145] Step 5:

[0146] The server combines the received sentiment data with scores from conventional evaluation metrics to generate a comprehensive evaluation report. This report details how emotions changed and what emotional responses were given to specific questions.

[0147] Step 6:

[0148] The terminal displays an evaluation report to the interviewer user. This report helps the user understand the candidate's emotional and technical evaluations, providing information to make a final decision.

[0149] Step 7:

[0150] Users can leverage the sentiment analysis portion of evaluation reports to gain a deeper understanding of candidates' suitability and potential challenges. Based on this information, they can make final hiring decisions.

[0151] This series of steps allows for a more comprehensive assessment, including the emotional aspects of the interview process.

[0152] (Example 2)

[0153] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0154] Modern negotiation processes often involve subjective judgments, and fairness can be compromised due to evaluators' biases or lack of experience. Furthermore, accurately grasping and reflecting the emotional state of speakers during negotiations is difficult. As a result, the overall evaluation may be unbalanced, leading to a decrease in its credibility.

[0155] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0156] In this invention, the server includes means for managing negotiation records by acquiring and storing audio or video information; means for analyzing the acquired information and identifying emotional states using an emotion analysis engine; and means for integrating the identified emotional states with existing evaluation indicators to generate a comprehensive score. This enables objective and comprehensive evaluation at all times.

[0157] "Audio or video information" refers to audio and video data collected during negotiations or interviews, including the speaker's utterances and facial expressions.

[0158] "Managing records" means saving and organizing collected audio or video information so that it can be used for later analysis and evaluation.

[0159] An "emotion analysis engine" is a technology or system that analyzes and identifies a speaker's emotional state based on information such as voice tone and changes in facial expression.

[0160] "Identifying emotional states" means using an emotion analysis engine to categorize the emotions a speaker is expressing and extracting them as data.

[0161] "Existing evaluation metrics" refer to evaluation criteria and scales that have been used for some time, including quantitative and qualitative indicators such as technical skills and communication abilities.

[0162] "Generating an overall score" means combining identified emotional states with existing evaluation metrics to calculate an evaluation score that takes into account the impact of individual elements on the whole.

[0163] An "evaluation report" is a document or report that summarizes the overall evaluation results during the negotiation or interview process, and includes visual information to facilitate understanding.

[0164] This invention is a system for supporting the evaluation of negotiations and interviews, which generates an overall score and creates an evaluation report by collecting audio and video information, identifying emotional states using an emotion analysis engine, and integrating it with existing evaluation indicators.

[0165] The device is used in interviews and negotiations. It is equipped with a high-precision camera and microphone to collect audio and video information in real time. For the emotion analysis engine, which analyzes changes in voice tone and facial expressions, commercially available emotion analysis software can be used. This includes, for example, general-purpose emotion analysis engines.

[0166] The server receives audio and video information transmitted from the terminal and analyzes it using an emotion analysis engine. The server integrates the identified emotional states with existing evaluation metrics to generate an overall score. This overall score is calculated using an algorithm that determines how different metrics contribute to the evaluation. The software installed on the server manages data storage, analysis, and scoring, and automatically generates evaluation reports.

[0167] Users can view evaluation reports displayed on their devices. These reports include visual graphs showing emotional states over time, making it easy to understand how emotions changed during the negotiation process. This allows users to make comprehensive decisions that include the candidate's emotional state.

[0168] As a concrete example, consider a question posed to a candidate: "How did you overcome a challenging situation?" If the candidate responds calmly and positively, the server identifies that emotion as positive and reflects it in the evaluation metrics. This makes it easier for users to determine that the candidate has stress tolerance.

[0169] An example of a prompt is, "Create an evaluation report based on the results of an emotional analysis of the candidate's stress coping mechanisms, visually displaying their emotional responses to individual questions." In this way, the evaluation process can be comprehensively supported.

[0170] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0171] Step 1:

[0172] The device acquires audio and video information in real time using a high-precision camera and microphone at the start of an interview or negotiation. The input consists of the candidate's speech and video data, which are temporarily stored within the device. Specifically, the camera captures facial expressions and the microphone captures voice tone, and these are converted into digital format. The output is a temporary dataset of that moment.

[0173] Step 2:

[0174] The device transmits collected audio and video information to the sentiment analysis engine. The input consists of stored audio and video data, which is then formatted for sentiment analysis. Specifically, feature vectors are extracted from the audio, and facial features are analyzed from the video. The output is a label of the emotional state at each moment.

[0175] Step 3:

[0176] The server receives the analyzed emotional states and stores them in a database along with existing evaluation metrics. The input is the identified emotional states and their timestamp information. The server then organizes the emotional labels for management as time-series data. The output is cumulative emotional time variation data.

[0177] Step 4:

[0178] The server integrates emotional states with other evaluation metrics to calculate an overall score. The input consists of time-series data on emotional states and existing evaluation metrics. In specific steps, a weighting algorithm is used to calculate the importance of each evaluation item, generating an overall score. The output is a numerical score representing the overall evaluation of the interview or negotiation.

[0179] Step 5:

[0180] The server uses the generated scores and sentiment data to create a visual evaluation report. The inputs are the overall score and time-series sentiment state data. This report includes graphs showing changes in sentiment and analysis results. The output is a detailed and easy-to-interpret evaluation report.

[0181] Step 6:

[0182] The terminal presents the generated evaluation report to the user. The input is the evaluation report created on the server and displayed on the user's screen. The final output is the evaluation result provided to the user as visually appealing material to aid in decision-making.

[0183] (Application Example 2)

[0184] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0185] In modern homes, there is a lack of effective means to support the mental state and emotional fluctuations of residents. In particular, despite the increasing importance of proper communication and stress management within the family, conventional technologies are structurally insufficient to address these issues. This invention aims to solve this problem by providing real-time responses and support tailored to the emotions of the residents.

[0186] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0187] In this invention, the server includes means for receiving and storing audio or visual data and managing the history of interactions; means for analyzing the received data, performing analysis based on specific evaluation criteria, and generating responses corresponding to the user's emotional state; and means for creating dialogue reports and supporting decisions based on the generated responses and their rationale. This makes it possible to understand the mental state of the occupants and provide appropriate feedback and support in real time.

[0188] "Audio or visual data" refers to basic digital information for communication, including the user's voice information and visual information such as facial expressions.

[0189] "Means for managing the history of interactions" refers to a function that records past interactions and conversations between the user and the system, making them available for reference as needed.

[0190] "A means of performing analysis based on specific evaluation criteria and generating responses that correspond to the user's emotional state" refers to a function in which the system automatically generates a real-time response appropriate to the user's emotional state based on collected data.

[0191] "A means of creating dialogue reports to support decision-making" refers to a function that summarizes the content of interactions with users, provides users with information including the results of sentiment analysis, and creates reports to aid understanding.

[0192] In this invention, by introducing a system that provides emotionally responsive feedback through dialogue with the user, it is possible to support the mental and emotional state of residents in real time.

[0193] The server receives user audio and visual data through input devices such as cameras and microphones, stores this data, and simultaneously performs analysis. This analysis includes converting audio data to text using an audio processing system (e.g., Google Speech-to-Text) and processing visual data using a facial recognition system (e.g., OpenCV and Dlib). This allows the server to analyze the user's emotions in real time based on specific evaluation criteria and generate responses corresponding to their emotional state.

[0194] As a concrete example, when a user experiences emotional fluctuations during a casual conversation, the system can detect their stress level from that data and provide appropriate feedback. For instance, if a user appears depressed, the system can suggest playing relaxation music. In this case, the prompt might be something like, "Suggest appropriate conversational topics when the user is feeling stressed."

[0195] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0196] Step 1:

[0197] The server receives audio and visual data from the terminal, including the user's speech and facial expressions. This data is received in stream format and temporarily stored for real-time processing.

[0198] Step 2:

[0199] The server converts the received audio data into text data using Google Speech-to-Text. This conversion process transforms the audio data into text format, which is then input into the next analysis step. The output is the text data of the words spoken by the user.

[0200] Step 3:

[0201] The server analyzes facial features from received visual data using OpenCV and Dlib. This process extracts user facial expression information and generates data to identify emotional states. The input is a user's facial image, and the output is emotion data based on facial expressions.

[0202] Step 4:

[0203] The server integrates both voice and facial expression data and uses a generative AI model to evaluate the user's overall emotional state. At this stage, weighting and contextual considerations are taken into account to calculate an emotional evaluation score. The output is the user's emotional evaluation score.

[0204] Step 5:

[0205] The server generates a predefined response based on the obtained sentiment evaluation score. The generated response is appropriate to the user's current emotional state. The prompt "Suggest an appropriate conversation when the user is feeling stressed" is used as a guide.

[0206] Step 6:

[0207] The device provides the user with a generated response, conveying its content via voice or display. This interaction reactivates the user, leading to the next cycle of the feedback loop.

[0208] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0209] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0210] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0211] [Second Embodiment]

[0212] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0213] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0214] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0215] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0216] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0217] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0218] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0219] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0220] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0221] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0222] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0223] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0224] This invention relates to a system that streamlines the evaluation process in job interviews and promotes fair judgment. Specific embodiments thereof are described below.

[0225] Data reception and record management

[0226] The device collects audio data in real time during the interview. This data is converted to text using speech recognition technology, sent to a server, and stored. The server organizes this data as an interview record and manages it for later analysis.

[0227] Data analysis and scoring

[0228] The server analyzes the received text data using natural language processing (NLP) technology. This detects keywords and phrases used during the interview and generates scores for each item based on pre-defined evaluation metrics. Scoring is performed objectively using an AI model, eliminating biases that arise from traditional subjective evaluations.

[0229] Generation and provision of evaluation reports

[0230] Based on the scoring results, the server generates an evaluation report. The report includes scores for each evaluation metric and their rationale, explained with visual elements (e.g., graphs, charts). The generated report is provided to the interviewer via a terminal, providing specific and objective information to assist in the final pass / fail decision.

[0231] Specific example

[0232] For example, if a candidate is asked a question about their "problem-solving ability" during an interview, the server identifies relevant keywords in their answer (e.g., "problem solving," "analysis," "implementation plan," etc.) and analyzes their score based on their frequency and context. The resulting report would then present to the user, for instance, "Problem-solving ability: 7 / 10, analytical skills are highly rated, but there is a lack of concrete examples." This allows the user to make a fair evaluation based on detailed analysis.

[0233] The system of this invention reduces the burden of many evaluation tasks faced by interviewers and contributes to improving the quality and fairness of hiring decisions.

[0234] The following describes the processing flow.

[0235] Step 1:

[0236] The terminal prepares for the interview and activates the device for recording audio data in real time. During this process, the recording environment is checked to ensure that the audio is captured accurately.

[0237] Step 2:

[0238] The device collects audio data and converts it into text data using speech recognition technology. During this conversion process, noise reduction and turn-taking are automatically performed to ensure that the text is transcribed in the appropriate context.

[0239] Step 3:

[0240] The server receives the converted text data and securely stores it in a database. This stored data will be used as a resource for later analysis.

[0241] Step 4:

[0242] The server analyzes stored text data using a natural language processing (NLP) engine. It extracts keywords and phrases related to specific evaluation metrics and performs contextual analysis based on the content of the utterances.

[0243] Step 5:

[0244] The server calculates a score corresponding to each evaluation metric based on the analysis results. The scores are generated according to evaluation criteria pre-trained by the AI model and are expressed as objective numerical values.

[0245] Step 6:

[0246] The server generates a detailed evaluation report based on the scoring results. This report includes the score for each metric, the reasoning behind it, and graphs and charts to make it easier to understand visually.

[0247] Step 7:

[0248] The device displays an evaluation report to the user acting as the interviewer. The user uses this report to gather information for making a final hiring decision.

[0249] Step 8:

[0250] The user analyzes the report content and makes a final evaluation of the candidate. If necessary, they examine specific items in detail and make a final hiring decision.

[0251] (Example 1)

[0252] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0253] Traditional interview evaluation processes rely on the subjective opinions of interviewers, potentially leading to bias and a lack of fairness. Furthermore, the time and effort required for evaluation are considerable, making them inefficient. Additionally, evaluation criteria often lack consistency, resulting in differing judgments from interviewer to interviewer. Addressing these issues and ensuring fairness and consistency in evaluations is essential.

[0254] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0255] In this invention, the server includes means for collecting audio data using an acoustic input device and converting it into text data using speech recognition technology; means for storing the converted text data in a secure storage device; and means for extracting keywords and phrases from the text data using natural language processing technology and scoring them based on evaluation criteria. This makes it possible to conduct objective interview evaluations with minimal bias and to improve the efficiency of the evaluation process.

[0256] An "acoustic input device" is a hardware device for collecting audio signals and is used to effectively capture what is said during an interview.

[0257] "Speech recognition technology" is a technology for converting speech data into text data, and is a method for processing human speech into a format that machines can understand.

[0258] "Text data" refers to string information that electronically records human speech, converted using speech recognition technology.

[0259] "Natural language processing technology" refers to a series of techniques that enable computers to understand and analyze human language and extract its meaning.

[0260] "Keywords and phrases" are important words or phrases used in the interview that are extracted as information related to the evaluation criteria.

[0261] "Evaluation criteria" refers to the indicators or conditions set in advance to be used as criteria for making evaluations during an interview.

[0262] "Scoring" is the process of analyzing text data to quantitatively evaluate each item based on evaluation criteria and providing a numerical evaluation.

[0263] A "generative AI model" is an algorithm that uses artificial intelligence technology to analyze and predict data, generating evaluation results based on that data.

[0264] "Visual elements" refer to graphical representations such as graphs and charts used within evaluation reports, and are elements used to make information easier to understand intuitively.

[0265] An "information terminal" is a digital device used by users to view evaluation reports, and includes electronic devices such as personal computers and tablets.

[0266] This invention relates to a system that automates the evaluation process of voice interviews, enabling fair and efficient evaluation. Specifically, it begins with collecting interview audio in real time using an acoustic input device. The terminal converts the collected audio into text data using speech recognition technology (e.g., a general speech recognition API) and sends the conversion result to a server.

[0267] The server stores text data in a secure storage device and simultaneously analyzes the data using natural language processing (NLP) techniques. A common NLP library is used as a specific example of the natural language processing techniques employed here. Important keywords and phrases are extracted from the text data, and scoring is performed based on these keywords according to pre-defined evaluation criteria. A generative AI model (e.g., a general language model) is used in the scoring process.

[0268] The scoring results are compiled into an evaluation report by the server. This report includes various visual elements (e.g., graphs, charts) and is provided to the interviewer via their information terminal. This report allows the user to make a final decision more easily and objectively.

[0269] As a concrete example, let's assume a candidate is asked questions related to "problem-solving ability" during an interview and responds to those questions. In this case, the server analyzes text data containing keywords such as "problem solving" and "implementation plan," and uses a generative AI model to derive an objective score. This result is then presented as an evaluation report.

[0270] An example of a prompt for a generative AI model is: "Based on the following interview responses, please evaluate the candidate's problem-solving ability and the reasoning behind it. 'What candidate said...'"

[0271] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0272] Step 1:

[0273] The device collects audio data in real time during the interview. It uses a microphone or other audio input device to record the conversation between the interviewer and the candidate. The input is audio data, which is then converted into text data using speech recognition technology. The converted text data is obtained as output and sent to the next step.

[0274] Step 2:

[0275] The terminal sends text data to the server. The server stores the received text data in a database. Metadata such as the interview date and time and candidate information are also recorded. The input is text data converted by speech recognition, and the output is the data stored and organized on the server.

[0276] Step 3:

[0277] The server uses natural language processing (NLP) techniques to analyze stored text data. It extracts keywords and phrases from the text using an NLP library. The input is stored text data, and the output contains keywords and related information necessary for evaluation.

[0278] Step 4:

[0279] The server uses a generative AI model to score extracted keywords based on evaluation criteria. The AI model understands the context of the text data and generates a score for each item. The input consists of analyzed keywords and evaluation criteria, and an objective score is output.

[0280] Step 5:

[0281] The server generates an evaluation report based on the scoring results. The score is represented using graphs and charts, including visual elements. The input is scored data, and a visually represented evaluation report is output.

[0282] Step 6:

[0283] The server sends the generated evaluation report to the terminal and provides it to the interviewer who is the user. The user can intuitively check the evaluation results using the information terminal. The input is the generated evaluation report, and a report in a displayable format used by the user is output.

[0284] (Application Example 1)

[0285] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as a "server", and the smart glasses 214 are referred to as a "terminal".

[0286] Conventional interview practice and evaluation methods tend to rely on subjective judgments and it is difficult to fairly and objectively evaluate the true abilities of candidates. As a result, appropriate personnel evaluation may not be conducted, which may lead to the selection of inappropriate personnel and the avoidance of excellent personnel. Also, the lack of support means for an individual to effectively conduct interview practice at home is also an issue.

[0287] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0288] <00009१1>In this invention, the server includes means for managing the record of an interview by receiving and storing voice or text data, means for analyzing the received data and generating a score based on specific evaluation criteria, and means for collecting and analyzing the user's answers in real time through interaction with home appliances, generating a score, and providing feedback. Thereby, an objective and fair interview practice in a home environment becomes possible, and it becomes possible for the user to effectively improve their skills.

[0289] "Voice or text data" is an information unit for recording human conversations and character information in digital form.

[0290] "Means for managing records" refers to methods and devices for systematically organizing received data and keeping it accessible as needed.

[0291] "Means for analyzing data and generating scores based on specific evaluation metrics" refers to methods or devices for processing data and calculating evaluation values based on pre-set criteria.

[0292] "Means for creating an evaluation report" refers to methods or devices that document evaluation results based on the generated scores and their rationale, and present them in a visually communicable format.

[0293] "Household automated devices" are computer-controlled devices used in the home that have the ability to interact with people through voice and actions.

[0294] "Means of providing feedback" refers to methods or devices for communicating reactions and suggestions to users based on analysis results regarding their actions and responses.

[0295] To realize this system, a combination of hardware used as a home automation device and software for speech recognition and natural language processing is required. The server processes the received audio data in real time and saves it as text using speech recognition technology. Specifically, it accurately converts speech into text by utilizing the Google Speech-to-Text API or a similar speech recognition system.

[0296] Next, the server analyzes the stored text data using natural language processing techniques. Using libraries such as Python's NLTK and spaCy, it detects keywords and phrases related to evaluation metrics from the text and generates a score based on them. This scoring utilizes a pre-trained AI model (e.g., using TensorFlow or PyTorch). The evaluation is specific and objective, and feedback is provided to the user.

[0297] Users can interact with home-based automated devices to create a simulated interview environment. For example, the device can ask questions such as, "Tell me about your leadership experience," and the user can provide voice prompts to answer. The system analyzes the answers in real time and provides voice feedback, such as whether the details of the leadership are specific or insufficient. This allows users to identify areas for improvement in their answers and efficiently practice interviews at home.

[0298] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0299] Step 1:

[0300] The user sets interview questions for a home automated device. The user inputs the questions using an interface. The entered questions are sent to the server in text format. The server prepares the received text data to be passed directly to the speech recognition module.

[0301] Step 2:

[0302] A home-use automated device uses speech synthesis technology to output question text received from a server. This allows the user to hear the questions presented by the device in real time. This audio output is generated based on the questions prepared by the user.

[0303] Step 3:

[0304] The user provides their response via voice. The device collects this voice data in real time and sends it to the server. The server uses speech recognition technology to convert it into text. As a result, the voice data is saved in text format, and this text becomes the input for subsequent data analysis.

[0305] Step 4:

[0306] The server analyzes the converted text data using natural language processing techniques. Using Python's NLTK and spaCy, it detects specific keywords and phrases. This structures the content of the answer, and the analysis results are prepared as input for the next-stage scoring process.

[0307] Step 5:

[0308] The server uses a pre-trained generative AI model to generate a score based on the analysis results. The AI model performs scoring using the frequency of occurrence and context of the input keywords as evaluation criteria. The generated score serves as the basic data for feedback to the user.

[0309] Step 6:

[0310] The server creates an evaluation report based on the generated score and the analysis results. The report includes, in addition to the score, specific areas for improvement and strengths. Furthermore, feedback for voice output is generated and sent to the home automation device.

[0311] Step 7:

[0312] The home automation device outputs the feedback received from the server as voice using voice synthesis technology. This enables the user to obtain a real-time evaluation of their answer and use it for improvement. An example of this prompt sentence is "Please specifically tell me about the challenges you faced."

[0313] Furthermore, an emotion engine for estimating the user's emotion may be combined. That is, the specific processing unit 290 may estimate the user's emotion using the emotion identification model 59 and perform specific processing using the user's emotion. <0This invention relates to an evaluation system that combines an emotion engine, which recognizes the user's emotions, with the interview process. This system analyzes the user's emotional state and reflects that information in the interview evaluation, enabling a more comprehensive and fair evaluation.

[0315] Data collection and emotion recognition

[0316] The device collects audio and video data during the interview in real time and sends it to an emotion recognition engine. This engine analyzes the user's emotions in real time based on information such as voice tone, facial expression changes, and word choice.

[0317] Analysis of emotional data

[0318] The server receives analysis results sent from the emotion recognition engine. This includes fluctuations in the user's emotions during the interview and their emotional responses to specific utterances. This data is integrated with other evaluation metrics within the interview evaluation system.

[0319] Scoring and report generation

[0320] The server adds the sentiment analysis results to existing evaluation metrics to generate an overall score. This score is based on objective information and complements the evaluation of emotional aspects that interviewers may find difficult to perceive. The server then creates an evaluation report based on this score and the analysis results.

[0321] Reports and decision support

[0322] The device provides the interviewer with an evaluation report. The report visually displays the sentiment analysis results, highlighting changes in emotions and key reactions during the interview. This additional information allows the user to conduct an evaluation that takes into account the candidate's emotional perception and personality.

[0323] Specific example

[0324] In an interview, if a question is asked about how the candidate handled a challenging situation, the sentiment analysis results received by the server indicate that the candidate used positive facial expressions and positive language in response. This allows the user to infer that the candidate can handle stressful situations calmly and reflect this in their evaluation.

[0325] Thus, the present invention enhances the quality of evaluation and promotes a more multifaceted understanding of candidates by incorporating sentiment data into conventional scoring methods.

[0326] The following describes the processing flow.

[0327] Step 1:

[0328] The device begins recording video and audio data simultaneously with the start of the interview. This collects digital data from the entire interview, ensuring that the information necessary for emotion recognition is secured.

[0329] Step 2:

[0330] The device sends collected audio data to an emotion recognition engine in real time, analyzing changes in voice tone and responses to specific emotions. During the speech recognition process, parameters such as volume, pitch, and speaking speed are analyzed.

[0331] Step 3:

[0332] The device uses video data to perform facial analysis and transmits changes in facial expressions and body movements to the emotion recognition engine. In this process, facial elements such as smiles, eyebrow movements, and gaze are analyzed.

[0333] Step 4:

[0334] The server processes the audio and facial expression analysis results received from the emotion recognition engine to organize the user's emotional fluctuations. Using this data, an emotion score is calculated to identify emotional trends throughout the interview.

[0335] Step 5:

[0336] The server combines the received sentiment data with scores from conventional evaluation metrics to generate a comprehensive evaluation report. This report details how emotions changed and what emotional responses were given to specific questions.

[0337] Step 6:

[0338] The terminal displays an evaluation report to the interviewer user. This report helps the user understand the candidate's emotional and technical evaluations, providing information to make a final decision.

[0339] Step 7:

[0340] Users can leverage the sentiment analysis portion of evaluation reports to gain a deeper understanding of candidates' suitability and potential challenges. Based on this information, they can make final hiring decisions.

[0341] This series of steps allows for a more comprehensive assessment, including the emotional aspects of the interview process.

[0342] (Example 2)

[0343] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0344] Modern negotiation processes often involve subjective judgments, and fairness can be compromised due to evaluators' biases or lack of experience. Furthermore, accurately grasping and reflecting the emotional state of speakers during negotiations is difficult. As a result, the overall evaluation may be unbalanced, leading to a decrease in its credibility.

[0345] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0346] In this invention, the server includes means for managing negotiation records by acquiring and storing audio or video information; means for analyzing the acquired information and identifying emotional states using an emotion analysis engine; and means for integrating the identified emotional states with existing evaluation indicators to generate a comprehensive score. This enables objective and comprehensive evaluation at all times.

[0347] "Audio or video information" refers to audio and video data collected during negotiations or interviews, including the speaker's utterances and facial expressions.

[0348] "Managing records" means saving and organizing collected audio or video information so that it can be used for later analysis and evaluation.

[0349] An "emotion analysis engine" is a technology or system that analyzes and identifies a speaker's emotional state based on information such as voice tone and changes in facial expression.

[0350] "Identifying emotional states" means using an emotion analysis engine to categorize the emotions a speaker is expressing and extracting them as data.

[0351] "Existing evaluation metrics" refer to evaluation criteria and scales that have been used for some time, including quantitative and qualitative indicators such as technical skills and communication abilities.

[0352] "Generating an overall score" means combining identified emotional states with existing evaluation metrics to calculate an evaluation score that takes into account the impact of individual elements on the whole.

[0353] An "evaluation report" is a document or report that summarizes the overall evaluation results during the negotiation or interview process, and includes visual information to facilitate understanding.

[0354] This invention is a system for supporting the evaluation of negotiations and interviews, which generates an overall score and creates an evaluation report by collecting audio and video information, identifying emotional states using an emotion analysis engine, and integrating it with existing evaluation indicators.

[0355] The device is used in interviews and negotiations. It is equipped with a high-precision camera and microphone to collect audio and video information in real time. For the emotion analysis engine, which analyzes changes in voice tone and facial expressions, commercially available emotion analysis software can be used. This includes, for example, general-purpose emotion analysis engines.

[0356] The server receives audio and video information transmitted from the terminal and analyzes it using an emotion analysis engine. The server integrates the identified emotional states with existing evaluation metrics to generate an overall score. This overall score is calculated using an algorithm that determines how different metrics contribute to the evaluation. The software installed on the server manages data storage, analysis, and scoring, and automatically generates evaluation reports.

[0357] Users can view evaluation reports displayed on their devices. These reports include visual graphs showing emotional states over time, making it easy to understand how emotions changed during the negotiation process. This allows users to make comprehensive decisions that include the candidate's emotional state.

[0358] As a concrete example, consider a question posed to a candidate: "How did you overcome a challenging situation?" If the candidate responds calmly and positively, the server identifies that emotion as positive and reflects it in the evaluation metrics. This makes it easier for users to determine that the candidate has stress tolerance.

[0359] An example of a prompt is, "Create an evaluation report based on the results of an emotional analysis of the candidate's stress coping mechanisms, visually displaying their emotional responses to individual questions." In this way, the evaluation process can be comprehensively supported.

[0360] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0361] Step 1:

[0362] The device acquires audio and video information in real time using a high-precision camera and microphone at the start of an interview or negotiation. The input consists of the candidate's speech and video data, which are temporarily stored within the device. Specifically, the camera captures facial expressions and the microphone captures voice tone, and these are converted into digital format. The output is a temporary dataset of that moment.

[0363] Step 2:

[0364] The device transmits collected audio and video information to the sentiment analysis engine. The input consists of stored audio and video data, which is then formatted for sentiment analysis. Specifically, feature vectors are extracted from the audio, and facial features are analyzed from the video. The output is a label of the emotional state at each moment.

[0365] Step 3:

[0366] The server receives the analyzed emotional states and stores them in a database along with existing evaluation metrics. The input is the identified emotional states and their timestamp information. The server then organizes the emotional labels for management as time-series data. The output is cumulative emotional time variation data.

[0367] Step 4:

[0368] The server integrates emotional states with other evaluation metrics to calculate an overall score. The input consists of time-series data on emotional states and existing evaluation metrics. In specific steps, a weighting algorithm is used to calculate the importance of each evaluation item, generating an overall score. The output is a numerical score representing the overall evaluation of the interview or negotiation.

[0369] Step 5:

[0370] The server uses the generated scores and sentiment data to create a visual evaluation report. The inputs are the overall score and time-series sentiment state data. This report includes graphs showing changes in sentiment and analysis results. The output is a detailed and easy-to-interpret evaluation report.

[0371] Step 6:

[0372] The terminal presents the generated evaluation report to the user. The input is the evaluation report created on the server and displayed on the user's screen. The final output is the evaluation result provided to the user as visually appealing material to aid in decision-making.

[0373] (Application Example 2)

[0374] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0375] In modern homes, there is a lack of effective means to support the mental state and emotional fluctuations of residents. In particular, despite the increasing importance of proper communication and stress management within the family, conventional technologies are structurally insufficient to address these issues. This invention aims to solve this problem by providing real-time responses and support tailored to the emotions of the residents.

[0376] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0377] In this invention, the server includes means for receiving and storing audio or visual data and managing the history of interactions; means for analyzing the received data, performing analysis based on specific evaluation criteria, and generating responses corresponding to the user's emotional state; and means for creating dialogue reports and supporting decisions based on the generated responses and their rationale. This makes it possible to understand the mental state of the occupants and provide appropriate feedback and support in real time.

[0378] "Audio or visual data" refers to basic digital information for communication, including the user's voice information and visual information such as facial expressions.

[0379] "Means for managing the history of interactions" refers to a function that records past interactions and conversations between the user and the system, making them available for reference as needed.

[0380] "A means of performing analysis based on specific evaluation criteria and generating responses that correspond to the user's emotional state" refers to a function in which the system automatically generates a real-time response appropriate to the user's emotional state based on collected data.

[0381] "A means of creating dialogue reports to support decision-making" refers to a function that summarizes the content of interactions with users, provides users with information including the results of sentiment analysis, and creates reports to aid understanding.

[0382] In this invention, by introducing a system that provides emotionally responsive feedback through dialogue with the user, it is possible to support the mental and emotional state of residents in real time.

[0383] The server receives user audio and visual data through input devices such as cameras and microphones, stores this data, and simultaneously performs analysis. This analysis includes converting audio data to text using an audio processing system (e.g., Google Speech-to-Text) and processing visual data using a facial recognition system (e.g., OpenCV and Dlib). This allows the server to analyze the user's emotions in real time based on specific evaluation criteria and generate responses corresponding to their emotional state.

[0384] As a concrete example, when a user experiences emotional fluctuations during a casual conversation, the system can detect their stress level from that data and provide appropriate feedback. For instance, if a user appears depressed, the system can suggest playing relaxation music. In this case, the prompt might be something like, "Suggest appropriate conversational topics when the user is feeling stressed."

[0385] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0386] Step 1:

[0387] The server receives audio and visual data from the terminal, including the user's speech and facial expressions. This data is received in stream format and temporarily stored for real-time processing.

[0388] Step 2:

[0389] The server converts the received audio data into text data using Google Speech-to-Text. This conversion process transforms the audio data into text format, which is then input into the next analysis step. The output is the text data of the words spoken by the user.

[0390] Step 3:

[0391] The server analyzes facial features from received visual data using OpenCV and Dlib. This process extracts user facial expression information and generates data to identify emotional states. The input is a user's facial image, and the output is emotion data based on facial expressions.

[0392] Step 4:

[0393] The server integrates both voice and facial expression data and uses a generative AI model to evaluate the user's overall emotional state. At this stage, weighting and contextual considerations are taken into account to calculate an emotional evaluation score. The output is the user's emotional evaluation score.

[0394] Step 5:

[0395] The server generates a predefined response based on the obtained sentiment evaluation score. The generated response is appropriate to the user's current emotional state. The prompt "Suggest an appropriate conversation when the user is feeling stressed" is used as a guide.

[0396] Step 6:

[0397] The device provides the user with a generated response, conveying its content via voice or display. This interaction reactivates the user, leading to the next cycle of the feedback loop.

[0398] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0399] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0400] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0401] [Third Embodiment]

[0402] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0403] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0404] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0405] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0406] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0407] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0408] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0409] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0410] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0411] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0412] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0413] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0414] This invention relates to a system that streamlines the evaluation process in job interviews and promotes fair judgment. Specific embodiments thereof are described below.

[0415] Data reception and record management

[0416] The device collects audio data in real time during the interview. This data is converted to text using speech recognition technology, sent to a server, and stored. The server organizes this data as an interview record and manages it for later analysis.

[0417] Data analysis and scoring

[0418] The server analyzes the received text data using natural language processing (NLP) technology. This detects keywords and phrases used during the interview and generates scores for each item based on pre-defined evaluation metrics. Scoring is performed objectively using an AI model, eliminating biases that arise from traditional subjective evaluations.

[0419] Generation and provision of evaluation reports

[0420] Based on the scoring results, the server generates an evaluation report. The report includes scores for each evaluation metric and their rationale, explained with visual elements (e.g., graphs, charts). The generated report is provided to the interviewer via a terminal, providing specific and objective information to assist in the final pass / fail decision.

[0421] Specific example

[0422] For example, if a candidate is asked a question about their "problem-solving ability" during an interview, the server identifies relevant keywords in their answer (e.g., "problem solving," "analysis," "implementation plan," etc.) and analyzes their score based on their frequency and context. The resulting report would then present to the user, for instance, "Problem-solving ability: 7 / 10, analytical skills are highly rated, but there is a lack of concrete examples." This allows the user to make a fair evaluation based on detailed analysis.

[0423] The system of this invention reduces the burden of many evaluation tasks faced by interviewers and contributes to improving the quality and fairness of hiring decisions.

[0424] The following describes the processing flow.

[0425] Step 1:

[0426] The terminal prepares for the interview and activates the device for recording audio data in real time. During this process, the recording environment is checked to ensure that the audio is captured accurately.

[0427] Step 2:

[0428] The device collects audio data and converts it into text data using speech recognition technology. During this conversion process, noise reduction and turn-taking are automatically performed to ensure that the text is transcribed in the appropriate context.

[0429] Step 3:

[0430] The server receives the converted text data and securely stores it in a database. This stored data will be used as a resource for later analysis.

[0431] Step 4:

[0432] The server analyzes stored text data using a natural language processing (NLP) engine. It extracts keywords and phrases related to specific evaluation metrics and performs contextual analysis based on the content of the utterances.

[0433] Step 5:

[0434] The server calculates a score corresponding to each evaluation metric based on the analysis results. The scores are generated according to evaluation criteria pre-trained by the AI model and are expressed as objective numerical values.

[0435] Step 6:

[0436] The server generates a detailed evaluation report based on the scoring results. This report includes the score for each metric, the reasoning behind it, and graphs and charts to make it easier to understand visually.

[0437] Step 7:

[0438] The device displays an evaluation report to the user acting as the interviewer. The user uses this report to gather information for making a final hiring decision.

[0439] Step 8:

[0440] The user analyzes the report content and makes a final evaluation of the candidate. If necessary, they examine specific items in detail and make a final hiring decision.

[0441] (Example 1)

[0442] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0443] Traditional interview evaluation processes rely on the subjective opinions of interviewers, potentially leading to bias and a lack of fairness. Furthermore, the time and effort required for evaluation are considerable, making them inefficient. Additionally, evaluation criteria often lack consistency, resulting in differing judgments from interviewer to interviewer. Addressing these issues and ensuring fairness and consistency in evaluations is essential.

[0444] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0445] In this invention, the server includes means for collecting audio data using an acoustic input device and converting it into text data using speech recognition technology; means for storing the converted text data in a secure storage device; and means for extracting keywords and phrases from the text data using natural language processing technology and scoring them based on evaluation criteria. This makes it possible to conduct objective interview evaluations with minimal bias and to improve the efficiency of the evaluation process.

[0446] An "acoustic input device" is a hardware device for collecting audio signals and is used to effectively capture what is said during an interview.

[0447] "Speech recognition technology" is a technology for converting speech data into text data, and is a method for processing human speech into a format that machines can understand.

[0448] "Text data" refers to string information that electronically records human speech, converted using speech recognition technology.

[0449] "Natural language processing technology" refers to a series of techniques that enable computers to understand and analyze human language and extract its meaning.

[0450] "Keywords and phrases" are important words or phrases used in the interview that are extracted as information related to the evaluation criteria.

[0451] "Evaluation criteria" refers to the indicators or conditions set in advance to be used as criteria for making evaluations during an interview.

[0452] "Scoring" is the process of analyzing text data to quantitatively evaluate each item based on evaluation criteria and providing a numerical evaluation.

[0453] A "generative AI model" is an algorithm that uses artificial intelligence technology to analyze and predict data, generating evaluation results based on that data.

[0454] "Visual elements" refer to graphical representations such as graphs and charts used within evaluation reports, and are elements used to make information easier to understand intuitively.

[0455] An "information terminal" is a digital device used by users to view evaluation reports, and includes electronic devices such as personal computers and tablets.

[0456] This invention relates to a system that automates the evaluation process of voice interviews, enabling fair and efficient evaluation. Specifically, it begins with collecting interview audio in real time using an acoustic input device. The terminal converts the collected audio into text data using speech recognition technology (e.g., a general speech recognition API) and sends the conversion result to a server.

[0457] The server stores text data in a secure storage device and simultaneously analyzes the data using natural language processing (NLP) techniques. A common NLP library is used as a specific example of the natural language processing techniques employed here. Important keywords and phrases are extracted from the text data, and scoring is performed based on these keywords according to pre-defined evaluation criteria. A generative AI model (e.g., a general language model) is used in the scoring process.

[0458] The scoring results are compiled into an evaluation report by the server. This report includes various visual elements (e.g., graphs, charts) and is provided to the interviewer via their information terminal. This report allows the user to make a final decision more easily and objectively.

[0459] As a concrete example, let's assume a candidate is asked questions related to "problem-solving ability" during an interview and responds to those questions. In this case, the server analyzes text data containing keywords such as "problem solving" and "implementation plan," and uses a generative AI model to derive an objective score. This result is then presented as an evaluation report.

[0460] An example of a prompt for a generative AI model is: "Based on the following interview responses, please evaluate the candidate's problem-solving ability and the reasoning behind it. 'What candidate said...'"

[0461] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0462] Step 1:

[0463] The device collects audio data in real time during the interview. It uses a microphone or other audio input device to record the conversation between the interviewer and the candidate. The input is audio data, which is then converted into text data using speech recognition technology. The converted text data is obtained as output and sent to the next step.

[0464] Step 2:

[0465] The terminal sends text data to the server. The server stores the received text data in a database. Metadata such as the interview date and time and candidate information are also recorded. The input is text data converted by speech recognition, and the output is the data stored and organized on the server.

[0466] Step 3:

[0467] The server uses natural language processing (NLP) techniques to analyze stored text data. It extracts keywords and phrases from the text using an NLP library. The input is stored text data, and the output contains keywords and related information necessary for evaluation.

[0468] Step 4:

[0469] The server uses a generative AI model to score extracted keywords based on evaluation criteria. The AI model understands the context of the text data and generates a score for each item. The input consists of analyzed keywords and evaluation criteria, and an objective score is output.

[0470] Step 5:

[0471] The server generates an evaluation report based on the scoring results. The score is represented using graphs and charts, including visual elements. The input is scored data, and a visually represented evaluation report is output.

[0472] Step 6:

[0473] The server sends the generated evaluation report to the terminal, providing it to the interviewer (the user). The user can then intuitively review the evaluation results using the information terminal. The input is the generated evaluation report, and the output is a report in a viewable format that can be used by the user.

[0474] (Application Example 1)

[0475] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0476] Traditional interview practice and evaluation methods tend to rely on subjective judgment, making it difficult to fairly and objectively assess a candidate's true abilities. This can lead to inadequate talent evaluation, potentially resulting in the selection of unsuitable candidates or the loss of talented individuals. Another challenge is the lack of support tools to help individuals effectively practice interview skills at home.

[0477] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0478] In this invention, the server includes means for managing interview records by receiving and storing voice or text data; means for analyzing the received data and generating a score based on specific evaluation metrics; and means for collecting and analyzing user responses in real time through interaction with home automated devices, generating a score, and providing feedback. This enables objective and fair interview practice in a home environment, allowing users to effectively improve their skills.

[0479] "Speech or text data" refers to a unit of information used to record human conversation or written information in a digital format.

[0480] "Means for managing records" refers to methods and devices for systematically organizing received data and keeping it accessible as needed.

[0481] "Means for analyzing data and generating scores based on specific evaluation metrics" refers to methods or devices for processing data and calculating evaluation values based on pre-set criteria.

[0482] "Means for creating an evaluation report" refers to methods or devices that document evaluation results based on the generated scores and their rationale, and present them in a visually communicable format.

[0483] "Household automated devices" are computer-controlled devices used in the home that have the ability to interact with people through voice and actions.

[0484] "Means of providing feedback" refers to methods or devices for communicating reactions and suggestions to users based on analysis results regarding their actions and responses.

[0485] To realize this system, a combination of hardware used as a home automation device and software for speech recognition and natural language processing is required. The server processes the received audio data in real time and saves it as text using speech recognition technology. Specifically, it accurately converts speech into text by utilizing the Google Speech-to-Text API or a similar speech recognition system.

[0486] Next, the server analyzes the stored text data using natural language processing techniques. Using libraries such as Python's NLTK and spaCy, it detects keywords and phrases related to evaluation metrics from the text and generates a score based on them. This scoring utilizes a pre-trained AI model (e.g., using TensorFlow or PyTorch). The evaluation is specific and objective, and feedback is provided to the user.

[0487] Users can interact with home-based automated devices to create a simulated interview environment. For example, the device can ask questions such as, "Tell me about your leadership experience," and the user can provide voice prompts to answer. The system analyzes the answers in real time and provides voice feedback, such as whether the details of the leadership are specific or insufficient. This allows users to identify areas for improvement in their answers and efficiently practice interviews at home.

[0488] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0489] Step 1:

[0490] The user sets interview questions for a home automated device. The user inputs the questions using an interface. The entered questions are sent to the server in text format. The server prepares the received text data to be passed directly to the speech recognition module.

[0491] Step 2:

[0492] A home-use automated device uses speech synthesis technology to output question text received from a server. This allows the user to hear the questions presented by the device in real time. This audio output is generated based on the questions prepared by the user.

[0493] Step 3:

[0494] The user provides their response via voice. The device collects this voice data in real time and sends it to the server. The server uses speech recognition technology to convert it into text. As a result, the voice data is saved in text format, and this text becomes the input for subsequent data analysis.

[0495] Step 4:

[0496] The server analyzes the converted text data using natural language processing techniques. It detects specific keywords and phrases using Python's NLTK and spaCy. This structures the content of the responses, and the analysis results are prepared as input for the next stage of scoring.

[0497] Step 5:

[0498] The server uses a pre-trained generative AI model to generate a score based on the analysis results. The AI model scores based on the frequency and context of the input keywords. The generated score serves as the basis for feedback provided to the user.

[0499] Step 6:

[0500] The server generates an evaluation report based on the generated score and analysis results. In addition to the score, the report includes specific areas for improvement and strengths. Furthermore, feedback for voice output is generated and sent to the home automated device.

[0501] Step 7:

[0502] The home automated device uses speech synthesis technology to output feedback received from the server. This allows users to receive real-time evaluations of their responses, which can then be used to improve their performance. One example of such a prompt is, "Please tell me specifically about the challenges you faced."

[0503] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0504] This invention relates to an evaluation system that combines an emotion engine, which recognizes the user's emotions, with the interview process. This system analyzes the user's emotional state and reflects that information in the interview evaluation, enabling a more comprehensive and fair evaluation.

[0505] Data collection and emotion recognition

[0506] The device collects audio and video data during the interview in real time and sends it to an emotion recognition engine. This engine analyzes the user's emotions in real time based on information such as voice tone, facial expression changes, and word choice.

[0507] Analysis of emotional data

[0508] The server receives analysis results sent from the emotion recognition engine. This includes fluctuations in the user's emotions during the interview and their emotional responses to specific utterances. This data is integrated with other evaluation metrics within the interview evaluation system.

[0509] Scoring and report generation

[0510] The server adds the sentiment analysis results to existing evaluation metrics to generate an overall score. This score is based on objective information and complements the evaluation of emotional aspects that interviewers may find difficult to perceive. The server then creates an evaluation report based on this score and the analysis results.

[0511] Reports and decision support

[0512] The device provides the interviewer with an evaluation report. The report visually displays the sentiment analysis results, highlighting changes in emotions and key reactions during the interview. This additional information allows the user to conduct an evaluation that takes into account the candidate's emotional perception and personality.

[0513] Specific example

[0514] In an interview, if a question is asked about how the candidate handled a challenging situation, the sentiment analysis results received by the server indicate that the candidate used positive facial expressions and positive language in response. This allows the user to infer that the candidate can handle stressful situations calmly and reflect this in their evaluation.

[0515] Thus, the present invention enhances the quality of evaluation and promotes a more multifaceted understanding of candidates by incorporating sentiment data into conventional scoring methods.

[0516] The following describes the processing flow.

[0517] Step 1:

[0518] The device begins recording video and audio data simultaneously with the start of the interview. This collects digital data from the entire interview, ensuring that the information necessary for emotion recognition is secured.

[0519] Step 2:

[0520] The device sends collected audio data to an emotion recognition engine in real time, analyzing changes in voice tone and responses to specific emotions. During the speech recognition process, parameters such as volume, pitch, and speaking speed are analyzed.

[0521] Step 3:

[0522] The device uses video data to perform facial analysis and transmits changes in facial expressions and body movements to the emotion recognition engine. In this process, facial elements such as smiles, eyebrow movements, and gaze are analyzed.

[0523] Step 4:

[0524] The server processes the audio and facial expression analysis results received from the emotion recognition engine to organize the user's emotional fluctuations. Using this data, an emotion score is calculated to identify emotional trends throughout the interview.

[0525] Step 5:

[0526] The server combines the received sentiment data with scores from conventional evaluation metrics to generate a comprehensive evaluation report. This report details how emotions changed and what emotional responses were given to specific questions.

[0527] Step 6:

[0528] The terminal displays an evaluation report to the interviewer user. This report helps the user understand the candidate's emotional and technical evaluations, providing information to make a final decision.

[0529] Step 7:

[0530] Users can leverage the sentiment analysis portion of evaluation reports to gain a deeper understanding of candidates' suitability and potential challenges. Based on this information, they can make final hiring decisions.

[0531] This series of steps allows for a more comprehensive assessment, including the emotional aspects of the interview process.

[0532] (Example 2)

[0533] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0534] Modern negotiation processes often involve subjective judgments, and fairness can be compromised due to evaluators' biases or lack of experience. Furthermore, accurately grasping and reflecting the emotional state of speakers during negotiations is difficult. As a result, the overall evaluation may be unbalanced, leading to a decrease in its credibility.

[0535] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0536] In this invention, the server includes means for managing negotiation records by acquiring and storing audio or video information; means for analyzing the acquired information and identifying emotional states using an emotion analysis engine; and means for integrating the identified emotional states with existing evaluation indicators to generate a comprehensive score. This enables objective and comprehensive evaluation at all times.

[0537] "Audio or video information" refers to audio and video data collected during negotiations or interviews, including the speaker's utterances and facial expressions.

[0538] "Managing records" means saving and organizing collected audio or video information so that it can be used for later analysis and evaluation.

[0539] An "emotion analysis engine" is a technology or system that analyzes and identifies a speaker's emotional state based on information such as voice tone and changes in facial expression.

[0540] "Identifying emotional states" means using an emotion analysis engine to categorize the emotions a speaker is expressing and extracting them as data.

[0541] "Existing evaluation metrics" refer to evaluation criteria and scales that have been used for some time, including quantitative and qualitative indicators such as technical skills and communication abilities.

[0542] "Generating an overall score" means combining identified emotional states with existing evaluation metrics to calculate an evaluation score that takes into account the impact of individual elements on the whole.

[0543] An "evaluation report" is a document or report that summarizes the overall evaluation results during the negotiation or interview process, and includes visual information to facilitate understanding.

[0544] This invention is a system for supporting the evaluation of negotiations and interviews, which generates an overall score and creates an evaluation report by collecting audio and video information, identifying emotional states using an emotion analysis engine, and integrating it with existing evaluation indicators.

[0545] The device is used in interviews and negotiations. It is equipped with a high-precision camera and microphone to collect audio and video information in real time. For the emotion analysis engine, which analyzes changes in voice tone and facial expressions, commercially available emotion analysis software can be used. This includes, for example, general-purpose emotion analysis engines.

[0546] The server receives audio and video information transmitted from the terminal and analyzes it using an emotion analysis engine. The server integrates the identified emotional states with existing evaluation metrics to generate an overall score. This overall score is calculated using an algorithm that determines how different metrics contribute to the evaluation. The software installed on the server manages data storage, analysis, and scoring, and automatically generates evaluation reports.

[0547] Users can view evaluation reports displayed on their devices. These reports include visual graphs showing emotional states over time, making it easy to understand how emotions changed during the negotiation process. This allows users to make comprehensive decisions that include the candidate's emotional state.

[0548] As a concrete example, consider a question posed to a candidate: "How did you overcome a challenging situation?" If the candidate responds calmly and positively, the server identifies that emotion as positive and reflects it in the evaluation metrics. This makes it easier for users to determine that the candidate has stress tolerance.

[0549] An example of a prompt is, "Create an evaluation report based on the results of an emotional analysis of the candidate's stress coping mechanisms, visually displaying their emotional responses to individual questions." In this way, the evaluation process can be comprehensively supported.

[0550] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0551] Step 1:

[0552] The device acquires audio and video information in real time using a high-precision camera and microphone at the start of an interview or negotiation. The input consists of the candidate's speech and video data, which are temporarily stored within the device. Specifically, the camera captures facial expressions and the microphone captures voice tone, and these are converted into digital format. The output is a temporary dataset of that moment.

[0553] Step 2:

[0554] The device transmits collected audio and video information to the sentiment analysis engine. The input consists of stored audio and video data, which is then formatted for sentiment analysis. Specifically, feature vectors are extracted from the audio, and facial features are analyzed from the video. The output is a label of the emotional state at each moment.

[0555] Step 3:

[0556] The server receives the analyzed emotional states and stores them in a database along with existing evaluation metrics. The input is the identified emotional states and their timestamp information. The server then organizes the emotional labels for management as time-series data. The output is cumulative emotional time variation data.

[0557] Step 4:

[0558] The server integrates emotional states with other evaluation metrics to calculate an overall score. The input consists of time-series data on emotional states and existing evaluation metrics. In specific steps, a weighting algorithm is used to calculate the importance of each evaluation item, generating an overall score. The output is a numerical score representing the overall evaluation of the interview or negotiation.

[0559] Step 5:

[0560] The server uses the generated scores and sentiment data to create a visual evaluation report. The inputs are the overall score and time-series sentiment state data. This report includes graphs showing changes in sentiment and analysis results. The output is a detailed and easy-to-interpret evaluation report.

[0561] Step 6:

[0562] The terminal presents the generated evaluation report to the user. The input is the evaluation report created on the server and displayed on the user's screen. The final output is the evaluation result provided to the user as visually appealing material to aid in decision-making.

[0563] (Application Example 2)

[0564] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0565] In modern homes, there is a lack of effective means to support the mental state and emotional fluctuations of residents. In particular, despite the increasing importance of proper communication and stress management within the family, conventional technologies are structurally insufficient to address these issues. This invention aims to solve this problem by providing real-time responses and support tailored to the emotions of the residents.

[0566] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0567] In this invention, the server includes means for receiving and storing audio or visual data and managing the history of interactions; means for analyzing the received data, performing analysis based on specific evaluation criteria, and generating responses corresponding to the user's emotional state; and means for creating dialogue reports and supporting decisions based on the generated responses and their rationale. This makes it possible to understand the mental state of the occupants and provide appropriate feedback and support in real time.

[0568] "Audio or visual data" refers to basic digital information for communication, including the user's voice information and visual information such as facial expressions.

[0569] "Means for managing the history of interactions" refers to a function that records past interactions and conversations between the user and the system, making them available for reference as needed.

[0570] "A means of performing analysis based on specific evaluation criteria and generating responses that correspond to the user's emotional state" refers to a function in which the system automatically generates a real-time response appropriate to the user's emotional state based on collected data.

[0571] "A means of creating dialogue reports to support decision-making" refers to a function that summarizes the content of interactions with users, provides users with information including the results of sentiment analysis, and creates reports to aid understanding.

[0572] In this invention, by introducing a system that provides emotionally responsive feedback through dialogue with the user, it is possible to support the mental and emotional state of residents in real time.

[0573] The server receives user audio and visual data through input devices such as cameras and microphones, stores this data, and simultaneously performs analysis. This analysis includes converting audio data to text using an audio processing system (e.g., Google Speech-to-Text) and processing visual data using a facial recognition system (e.g., OpenCV and Dlib). This allows the server to analyze the user's emotions in real time based on specific evaluation criteria and generate responses corresponding to their emotional state.

[0574] As a concrete example, when a user experiences emotional fluctuations during a casual conversation, the system can detect their stress level from that data and provide appropriate feedback. For instance, if a user appears depressed, the system can suggest playing relaxation music. In this case, the prompt might be something like, "Suggest appropriate conversational topics when the user is feeling stressed."

[0575] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0576] Step 1:

[0577] The server receives audio and visual data from the terminal, including the user's speech and facial expressions. This data is received in stream format and temporarily stored for real-time processing.

[0578] Step 2:

[0579] The server converts the received audio data into text data using Google Speech-to-Text. This conversion process transforms the audio data into text format, which is then input into the next analysis step. The output is the text data of the words spoken by the user.

[0580] Step 3:

[0581] The server analyzes facial features from received visual data using OpenCV and Dlib. This process extracts user facial expression information and generates data to identify emotional states. The input is a user's facial image, and the output is emotion data based on facial expressions.

[0582] Step 4:

[0583] The server integrates both voice and facial expression data and uses a generative AI model to evaluate the user's overall emotional state. At this stage, weighting and contextual considerations are taken into account to calculate an emotional evaluation score. The output is the user's emotional evaluation score.

[0584] Step 5:

[0585] The server generates a predefined response based on the obtained sentiment evaluation score. The generated response is appropriate to the user's current emotional state. The prompt "Suggest an appropriate conversation when the user is feeling stressed" is used as a guide.

[0586] Step 6:

[0587] The device provides the user with a generated response, conveying its content via voice or display. This interaction reactivates the user, leading to the next cycle of the feedback loop.

[0588] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0589] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0590] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0591] [Fourth Embodiment]

[0592] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0593] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0594] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0595] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0596] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0597] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0598] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0599] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0600] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0601] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0602] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0603] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0604] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0605] This invention relates to a system that streamlines the evaluation process in job interviews and promotes fair judgment. Specific embodiments thereof are described below.

[0606] Data reception and record management

[0607] The device collects audio data in real time during the interview. This data is converted to text using speech recognition technology, sent to a server, and stored. The server organizes this data as an interview record and manages it for later analysis.

[0608] Data analysis and scoring

[0609] The server analyzes the received text data using natural language processing (NLP) technology. This detects keywords and phrases used during the interview and generates scores for each item based on pre-defined evaluation metrics. Scoring is performed objectively using an AI model, eliminating biases that arise from traditional subjective evaluations.

[0610] Generation and provision of evaluation reports

[0611] Based on the scoring results, the server generates an evaluation report. The report includes scores for each evaluation metric and their rationale, explained with visual elements (e.g., graphs, charts). The generated report is provided to the interviewer via a terminal, providing specific and objective information to assist in the final pass / fail decision.

[0612] Specific example

[0613] For example, if a candidate is asked a question about their "problem-solving ability" during an interview, the server identifies relevant keywords in their answer (e.g., "problem solving," "analysis," "implementation plan," etc.) and analyzes their score based on their frequency and context. The resulting report would then present to the user, for instance, "Problem-solving ability: 7 / 10, analytical skills are highly rated, but there is a lack of concrete examples." This allows the user to make a fair evaluation based on detailed analysis.

[0614] The system of this invention reduces the burden of many evaluation tasks faced by interviewers and contributes to improving the quality and fairness of hiring decisions.

[0615] The following describes the processing flow.

[0616] Step 1:

[0617] The terminal prepares for the interview and activates the device for recording audio data in real time. During this process, the recording environment is checked to ensure that the audio is captured accurately.

[0618] Step 2:

[0619] The device collects audio data and converts it into text data using speech recognition technology. During this conversion process, noise reduction and turn-taking are automatically performed to ensure that the text is transcribed in the appropriate context.

[0620] Step 3:

[0621] The server receives the converted text data and securely stores it in a database. This stored data will be used as a resource for later analysis.

[0622] Step 4:

[0623] The server analyzes stored text data using a natural language processing (NLP) engine. It extracts keywords and phrases related to specific evaluation metrics and performs contextual analysis based on the content of the utterances.

[0624] Step 5:

[0625] The server calculates a score corresponding to each evaluation metric based on the analysis results. The scores are generated according to evaluation criteria pre-trained by the AI model and are expressed as objective numerical values.

[0626] Step 6:

[0627] The server generates a detailed evaluation report based on the scoring results. This report includes the score for each metric, the reasoning behind it, and graphs and charts to make it easier to understand visually.

[0628] Step 7:

[0629] The device displays an evaluation report to the user acting as the interviewer. The user uses this report to gather information for making a final hiring decision.

[0630] Step 8:

[0631] The user analyzes the report content and makes a final evaluation of the candidate. If necessary, they examine specific items in detail and make a final hiring decision.

[0632] (Example 1)

[0633] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0634] Traditional interview evaluation processes rely on the subjective opinions of interviewers, potentially leading to bias and a lack of fairness. Furthermore, the time and effort required for evaluation are considerable, making them inefficient. Additionally, evaluation criteria often lack consistency, resulting in differing judgments from interviewer to interviewer. Addressing these issues and ensuring fairness and consistency in evaluations is essential.

[0635] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0636] In this invention, the server includes means for collecting audio data using an acoustic input device and converting it into text data using speech recognition technology; means for storing the converted text data in a secure storage device; and means for extracting keywords and phrases from the text data using natural language processing technology and scoring them based on evaluation criteria. This makes it possible to conduct objective interview evaluations with minimal bias and to improve the efficiency of the evaluation process.

[0637] An "acoustic input device" is a hardware device for collecting audio signals and is used to effectively capture what is said during an interview.

[0638] "Speech recognition technology" is a technology for converting speech data into text data, and is a method for processing human speech into a format that machines can understand.

[0639] "Text data" refers to string information that electronically records human speech, converted using speech recognition technology.

[0640] "Natural language processing technology" refers to a series of techniques that enable computers to understand and analyze human language and extract its meaning.

[0641] "Keywords and phrases" are important words or phrases used in the interview that are extracted as information related to the evaluation criteria.

[0642] "Evaluation criteria" refers to the indicators or conditions set in advance to be used as criteria for making evaluations during an interview.

[0643] "Scoring" is the process of analyzing text data to quantitatively evaluate each item based on evaluation criteria and providing a numerical evaluation.

[0644] A "generative AI model" is an algorithm that uses artificial intelligence technology to analyze and predict data, generating evaluation results based on that data.

[0645] "Visual elements" refer to graphical representations such as graphs and charts used within evaluation reports, and are elements used to make information easier to understand intuitively.

[0646] An "information terminal" is a digital device used by users to view evaluation reports, and includes electronic devices such as personal computers and tablets.

[0647] This invention relates to a system that automates the evaluation process of voice interviews, enabling fair and efficient evaluation. Specifically, it begins with collecting interview audio in real time using an acoustic input device. The terminal converts the collected audio into text data using speech recognition technology (e.g., a general speech recognition API) and sends the conversion result to a server.

[0648] The server stores text data in a secure storage device and simultaneously analyzes the data using natural language processing (NLP) techniques. A common NLP library is used as a specific example of the natural language processing techniques employed here. Important keywords and phrases are extracted from the text data, and scoring is performed based on these keywords according to pre-defined evaluation criteria. A generative AI model (e.g., a general language model) is used in the scoring process.

[0649] The scoring results are compiled into an evaluation report by the server. This report includes various visual elements (e.g., graphs, charts) and is provided to the interviewer via their information terminal. This report allows the user to make a final decision more easily and objectively.

[0650] As a concrete example, let's assume a candidate is asked questions related to "problem-solving ability" during an interview and responds to those questions. In this case, the server analyzes text data containing keywords such as "problem solving" and "implementation plan," and uses a generative AI model to derive an objective score. This result is then presented as an evaluation report.

[0651] An example of a prompt for a generative AI model is: "Based on the following interview responses, please evaluate the candidate's problem-solving ability and the reasoning behind it. 'What candidate said...'"

[0652] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0653] Step 1:

[0654] The device collects audio data in real time during the interview. It uses a microphone or other audio input device to record the conversation between the interviewer and the candidate. The input is audio data, which is then converted into text data using speech recognition technology. The converted text data is obtained as output and sent to the next step.

[0655] Step 2:

[0656] The terminal sends text data to the server. The server stores the received text data in a database. Metadata such as the interview date and time and candidate information are also recorded. The input is text data converted by speech recognition, and the output is the data stored and organized on the server.

[0657] Step 3:

[0658] The server uses natural language processing (NLP) techniques to analyze stored text data. It extracts keywords and phrases from the text using an NLP library. The input is stored text data, and the output contains keywords and related information necessary for evaluation.

[0659] Step 4:

[0660] The server uses a generative AI model to score extracted keywords based on evaluation criteria. The AI model understands the context of the text data and generates a score for each item. The input consists of analyzed keywords and evaluation criteria, and an objective score is output.

[0661] Step 5:

[0662] The server generates an evaluation report based on the scoring results. The score is represented using graphs and charts, including visual elements. The input is scored data, and a visually represented evaluation report is output.

[0663] Step 6:

[0664] The server sends the generated evaluation report to the terminal, providing it to the interviewer (the user). The user can then intuitively review the evaluation results using the information terminal. The input is the generated evaluation report, and the output is a report in a viewable format that can be used by the user.

[0665] (Application Example 1)

[0666] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0667] Traditional interview practice and evaluation methods tend to rely on subjective judgment, making it difficult to fairly and objectively assess a candidate's true abilities. This can lead to inadequate talent evaluation, potentially resulting in the selection of unsuitable candidates or the loss of talented individuals. Another challenge is the lack of support tools to help individuals effectively practice interview skills at home.

[0668] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0669] In this invention, the server includes means for managing interview records by receiving and storing voice or text data; means for analyzing the received data and generating a score based on specific evaluation metrics; and means for collecting and analyzing user responses in real time through interaction with home automated devices, generating a score, and providing feedback. This enables objective and fair interview practice in a home environment, allowing users to effectively improve their skills.

[0670] "Speech or text data" refers to a unit of information used to record human conversation or written information in a digital format.

[0671] "Means for managing records" refers to methods and devices for systematically organizing received data and keeping it accessible as needed.

[0672] "Means for analyzing data and generating scores based on specific evaluation metrics" refers to methods or devices for processing data and calculating evaluation values based on pre-set criteria.

[0673] "Means for creating an evaluation report" refers to methods or devices that document evaluation results based on the generated scores and their rationale, and present them in a visually communicable format.

[0674] "Household automated devices" are computer-controlled devices used in the home that have the ability to interact with people through voice and actions.

[0675] "Means of providing feedback" refers to methods or devices for communicating reactions and suggestions to users based on analysis results regarding their actions and responses.

[0676] To realize this system, a combination of hardware used as a home automation device and software for speech recognition and natural language processing is required. The server processes the received audio data in real time and saves it as text using speech recognition technology. Specifically, it accurately converts speech into text by utilizing the Google Speech-to-Text API or a similar speech recognition system.

[0677] Next, the server analyzes the stored text data using natural language processing techniques. Using libraries such as Python's NLTK and spaCy, it detects keywords and phrases related to evaluation metrics from the text and generates a score based on them. This scoring utilizes a pre-trained AI model (e.g., using TensorFlow or PyTorch). The evaluation is specific and objective, and feedback is provided to the user.

[0678] Users can interact with home-based automated devices to create a simulated interview environment. For example, the device can ask questions such as, "Tell me about your leadership experience," and the user can provide voice prompts to answer. The system analyzes the answers in real time and provides voice feedback, such as whether the details of the leadership are specific or insufficient. This allows users to identify areas for improvement in their answers and efficiently practice interviews at home.

[0679] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0680] Step 1:

[0681] The user sets interview questions for a home automated device. The user inputs the questions using an interface. The entered questions are sent to the server in text format. The server prepares the received text data to be passed directly to the speech recognition module.

[0682] Step 2:

[0683] A home-use automated device uses speech synthesis technology to output question text received from a server. This allows the user to hear the questions presented by the device in real time. This audio output is generated based on the questions prepared by the user.

[0684] Step 3:

[0685] The user provides their response via voice. The device collects this voice data in real time and sends it to the server. The server uses speech recognition technology to convert it into text. As a result, the voice data is saved in text format, and this text becomes the input for subsequent data analysis.

[0686] Step 4:

[0687] The server analyzes the converted text data using natural language processing techniques. It detects specific keywords and phrases using Python's NLTK and spaCy. This structures the content of the responses, and the analysis results are prepared as input for the next stage of scoring.

[0688] Step 5:

[0689] The server uses a pre-trained generative AI model to generate a score based on the analysis results. The AI model scores based on the frequency and context of the input keywords. The generated score serves as the basis for feedback provided to the user.

[0690] Step 6:

[0691] The server generates an evaluation report based on the generated score and analysis results. In addition to the score, the report includes specific areas for improvement and strengths. Furthermore, feedback for voice output is generated and sent to the home automated device.

[0692] Step 7:

[0693] The home automated device uses speech synthesis technology to output feedback received from the server. This allows users to receive real-time evaluations of their responses, which can then be used to improve their performance. One example of such a prompt is, "Please tell me specifically about the challenges you faced."

[0694] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0695] This invention relates to an evaluation system that combines an emotion engine, which recognizes the user's emotions, with the interview process. This system analyzes the user's emotional state and reflects that information in the interview evaluation, enabling a more comprehensive and fair evaluation.

[0696] Data collection and emotion recognition

[0697] The device collects audio and video data during the interview in real time and sends it to an emotion recognition engine. This engine analyzes the user's emotions in real time based on information such as voice tone, facial expression changes, and word choice.

[0698] Analysis of emotional data

[0699] The server receives analysis results sent from the emotion recognition engine. This includes fluctuations in the user's emotions during the interview and their emotional responses to specific utterances. This data is integrated with other evaluation metrics within the interview evaluation system.

[0700] Scoring and report generation

[0701] The server adds the sentiment analysis results to existing evaluation metrics to generate an overall score. This score is based on objective information and complements the evaluation of emotional aspects that interviewers may find difficult to perceive. The server then creates an evaluation report based on this score and the analysis results.

[0702] Reports and decision support

[0703] The device provides the interviewer with an evaluation report. The report visually displays the sentiment analysis results, highlighting changes in emotions and key reactions during the interview. This additional information allows the user to conduct an evaluation that takes into account the candidate's emotional perception and personality.

[0704] Specific example

[0705] In an interview, if a question is asked about how the candidate handled a challenging situation, the sentiment analysis results received by the server indicate that the candidate used positive facial expressions and positive language in response. This allows the user to infer that the candidate can handle stressful situations calmly and reflect this in their evaluation.

[0706] Thus, the present invention enhances the quality of evaluation and promotes a more multifaceted understanding of candidates by incorporating sentiment data into conventional scoring methods.

[0707] The following describes the processing flow.

[0708] Step 1:

[0709] The device begins recording video and audio data simultaneously with the start of the interview. This collects digital data from the entire interview, ensuring that the information necessary for emotion recognition is secured.

[0710] Step 2:

[0711] The device sends collected audio data to an emotion recognition engine in real time, analyzing changes in voice tone and responses to specific emotions. During the speech recognition process, parameters such as volume, pitch, and speaking speed are analyzed.

[0712] Step 3:

[0713] The device uses video data to perform facial analysis and transmits changes in facial expressions and body movements to the emotion recognition engine. In this process, facial elements such as smiles, eyebrow movements, and gaze are analyzed.

[0714] Step 4:

[0715] The server processes the audio and facial expression analysis results received from the emotion recognition engine to organize the user's emotional fluctuations. Using this data, an emotion score is calculated to identify emotional trends throughout the interview.

[0716] Step 5:

[0717] The server combines the received sentiment data with scores from conventional evaluation metrics to generate a comprehensive evaluation report. This report details how emotions changed and what emotional responses were given to specific questions.

[0718] Step 6:

[0719] The terminal displays an evaluation report to the interviewer user. This report helps the user understand the candidate's emotional and technical evaluations, providing information to make a final decision.

[0720] Step 7:

[0721] Users can leverage the sentiment analysis portion of evaluation reports to gain a deeper understanding of candidates' suitability and potential challenges. Based on this information, they can make final hiring decisions.

[0722] This series of steps allows for a more comprehensive assessment, including the emotional aspects of the interview process.

[0723] (Example 2)

[0724] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0725] Modern negotiation processes often involve subjective judgments, and fairness can be compromised due to evaluators' biases or lack of experience. Furthermore, accurately grasping and reflecting the emotional state of speakers during negotiations is difficult. As a result, the overall evaluation may be unbalanced, leading to a decrease in its credibility.

[0726] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0727] In this invention, the server includes means for managing negotiation records by acquiring and storing audio or video information; means for analyzing the acquired information and identifying emotional states using an emotion analysis engine; and means for integrating the identified emotional states with existing evaluation indicators to generate a comprehensive score. This enables objective and comprehensive evaluation at all times.

[0728] "Audio or video information" refers to audio and video data collected during negotiations or interviews, including the speaker's utterances and facial expressions.

[0729] "Managing records" means saving and organizing collected audio or video information so that it can be used for later analysis and evaluation.

[0730] An "emotion analysis engine" is a technology or system that analyzes and identifies a speaker's emotional state based on information such as voice tone and changes in facial expression.

[0731] "Identifying emotional states" means using an emotion analysis engine to categorize the emotions a speaker is expressing and extracting them as data.

[0732] "Existing evaluation metrics" refer to evaluation criteria and scales that have been used for some time, including quantitative and qualitative indicators such as technical skills and communication abilities.

[0733] "Generating an overall score" means combining identified emotional states with existing evaluation metrics to calculate an evaluation score that takes into account the impact of individual elements on the whole.

[0734] An "evaluation report" is a document or report that summarizes the overall evaluation results during the negotiation or interview process, and includes visual information to facilitate understanding.

[0735] This invention is a system for supporting the evaluation of negotiations and interviews, which generates an overall score and creates an evaluation report by collecting audio and video information, identifying emotional states using an emotion analysis engine, and integrating it with existing evaluation indicators.

[0736] The device is used in interviews and negotiations. It is equipped with a high-precision camera and microphone to collect audio and video information in real time. For the emotion analysis engine, which analyzes changes in voice tone and facial expressions, commercially available emotion analysis software can be used. This includes, for example, general-purpose emotion analysis engines.

[0737] The server receives audio and video information transmitted from the terminal and analyzes it using an emotion analysis engine. The server integrates the identified emotional states with existing evaluation metrics to generate an overall score. This overall score is calculated using an algorithm that determines how different metrics contribute to the evaluation. The software installed on the server manages data storage, analysis, and scoring, and automatically generates evaluation reports.

[0738] Users can view evaluation reports displayed on their devices. These reports include visual graphs showing emotional states over time, making it easy to understand how emotions changed during the negotiation process. This allows users to make comprehensive decisions that include the candidate's emotional state.

[0739] As a concrete example, consider a question posed to a candidate: "How did you overcome a challenging situation?" If the candidate responds calmly and positively, the server identifies that emotion as positive and reflects it in the evaluation metrics. This makes it easier for users to determine that the candidate has stress tolerance.

[0740] An example of a prompt is, "Create an evaluation report based on the results of an emotional analysis of the candidate's stress coping mechanisms, visually displaying their emotional responses to individual questions." In this way, the evaluation process can be comprehensively supported.

[0741] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0742] Step 1:

[0743] The device acquires audio and video information in real time using a high-precision camera and microphone at the start of an interview or negotiation. The input consists of the candidate's speech and video data, which are temporarily stored within the device. Specifically, the camera captures facial expressions and the microphone captures voice tone, and these are converted into digital format. The output is a temporary dataset of that moment.

[0744] Step 2:

[0745] The device transmits collected audio and video information to the sentiment analysis engine. The input consists of stored audio and video data, which is then formatted for sentiment analysis. Specifically, feature vectors are extracted from the audio, and facial features are analyzed from the video. The output is a label of the emotional state at each moment.

[0746] Step 3:

[0747] The server receives the analyzed emotional states and stores them in a database along with existing evaluation metrics. The input is the identified emotional states and their timestamp information. The server then organizes the emotional labels for management as time-series data. The output is cumulative emotional time variation data.

[0748] Step 4:

[0749] The server integrates emotional states with other evaluation metrics to calculate an overall score. The input consists of time-series data on emotional states and existing evaluation metrics. In specific steps, a weighting algorithm is used to calculate the importance of each evaluation item, generating an overall score. The output is a numerical score representing the overall evaluation of the interview or negotiation.

[0750] Step 5:

[0751] The server uses the generated scores and sentiment data to create a visual evaluation report. The inputs are the overall score and time-series sentiment state data. This report includes graphs showing changes in sentiment and analysis results. The output is a detailed and easy-to-interpret evaluation report.

[0752] Step 6:

[0753] The terminal presents the generated evaluation report to the user. The input is the evaluation report created on the server and displayed on the user's screen. The final output is the evaluation result provided to the user as visually appealing material to aid in decision-making.

[0754] (Application Example 2)

[0755] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0756] In modern homes, there is a lack of effective means to support the mental state and emotional fluctuations of residents. In particular, despite the increasing importance of proper communication and stress management within the family, conventional technologies are structurally insufficient to address these issues. This invention aims to solve this problem by providing real-time responses and support tailored to the emotions of the residents.

[0757] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0758] In this invention, the server includes means for receiving and storing audio or visual data and managing the history of interactions; means for analyzing the received data, performing analysis based on specific evaluation criteria, and generating responses corresponding to the user's emotional state; and means for creating dialogue reports and supporting decisions based on the generated responses and their rationale. This makes it possible to understand the mental state of the occupants and provide appropriate feedback and support in real time.

[0759] "Audio or visual data" refers to basic digital information for communication, including the user's voice information and visual information such as facial expressions.

[0760] "Means for managing the history of interactions" refers to a function that records past interactions and conversations between the user and the system, making them available for reference as needed.

[0761] "A means of performing analysis based on specific evaluation criteria and generating responses that correspond to the user's emotional state" refers to a function in which the system automatically generates a real-time response appropriate to the user's emotional state based on collected data.

[0762] "A means of creating dialogue reports to support decision-making" refers to a function that summarizes the content of interactions with users, provides users with information including the results of sentiment analysis, and creates reports to aid understanding.

[0763] In this invention, by introducing a system that provides emotionally responsive feedback through dialogue with the user, it is possible to support the mental and emotional state of residents in real time.

[0764] The server receives user audio and visual data through input devices such as cameras and microphones, stores this data, and simultaneously performs analysis. This analysis includes converting audio data to text using an audio processing system (e.g., Google Speech-to-Text) and processing visual data using a facial recognition system (e.g., OpenCV and Dlib). This allows the server to analyze the user's emotions in real time based on specific evaluation criteria and generate responses corresponding to their emotional state.

[0765] As a concrete example, when a user experiences emotional fluctuations during a casual conversation, the system can detect their stress level from that data and provide appropriate feedback. For instance, if a user appears depressed, the system can suggest playing relaxation music. In this case, the prompt might be something like, "Suggest appropriate conversational topics when the user is feeling stressed."

[0766] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0767] Step 1:

[0768] The server receives audio and visual data from the terminal, including the user's speech and facial expressions. This data is received in stream format and temporarily stored for real-time processing.

[0769] Step 2:

[0770] The server converts the received audio data into text data using Google Speech-to-Text. This conversion process transforms the audio data into text format, which is then input into the next analysis step. The output is the text data of the words spoken by the user.

[0771] Step 3:

[0772] The server analyzes facial features from received visual data using OpenCV and Dlib. This process extracts user facial expression information and generates data to identify emotional states. The input is a user's facial image, and the output is emotion data based on facial expressions.

[0773] Step 4:

[0774] The server integrates both voice and facial expression data and uses a generative AI model to evaluate the user's overall emotional state. At this stage, weighting and contextual considerations are taken into account to calculate an emotional evaluation score. The output is the user's emotional evaluation score.

[0775] Step 5:

[0776] The server generates a predefined response based on the obtained sentiment evaluation score. The generated response is appropriate to the user's current emotional state. The prompt "Suggest an appropriate conversation when the user is feeling stressed" is used as a guide.

[0777] Step 6:

[0778] The device provides the user with a generated response, conveying its content via voice or display. This interaction reactivates the user, leading to the next cycle of the feedback loop.

[0779] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0780] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0781] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0782] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0783] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0784] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0785] The inside of the Emotion Map 400 represents what's in your mind, while the outside represents what you're doing. Therefore, the further you go out the 400-coordinate scale, the more visible your emotions become (the more they manifest in your actions).

[0786] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0787] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0788] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0789] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0790] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0791] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0792] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0793] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0794] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0795] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0796] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0797] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0798] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0799] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0800] The following is further disclosed regarding the embodiments described above.

[0801] (Claim 1)

[0802] A means of managing interview records by receiving and storing audio or text data,

[0803] The means for analyzing the received data and generating a score based on a specific evaluation index,

[0804] A means for creating an interview evaluation report based on the generated score and its basis,

[0805] A system that includes means of providing the aforementioned evaluation report to the interviewer to support their final decision.

[0806] (Claim 2)

[0807] The system according to claim 1, further comprising a function to generate a bias-free score by detecting keywords and phrases related to interview evaluation metrics.

[0808] (Claim 3)

[0809] The system according to claim 1, further comprising means for including visual information in the evaluation report so that the interviewer can easily understand the evaluation results.

[0810] "Example 1"

[0811] (Claim 1)

[0812] A means for collecting audio data using an acoustic input device and converting it into text data using speech recognition technology,

[0813] Means for storing the converted text data in a secure storage device,

[0814] A method for extracting keywords and phrases from text data using natural language processing technology and scoring them based on evaluation criteria,

[0815] A means for analyzing scoring results using a generative AI model and generating an evaluation report,

[0816] A means of providing evaluation support by delivering evaluation reports, including visual elements, to personnel via information terminals,

[0817] A system that includes this.

[0818] (Claim 2)

[0819] The system according to claim 1, comprising a method for minimizing bias in interview evaluations using a keyword detection and generation AI model.

[0820] (Claim 3)

[0821] The system according to claim 1, further comprising means for including graphic information in the evaluation report and enabling the person in charge to intuitively understand the evaluation results.

[0822] "Application Example 1"

[0823] (Claim 1)

[0824] A means of managing interview records by receiving and storing audio or text data,

[0825] The means for analyzing the received data and generating a score based on a specific evaluation index,

[0826] A means for creating an interview evaluation report based on the generated score and its basis,

[0827] The aforementioned evaluation report is provided to the interviewer as a means to support their final decision,

[0828] A system that includes means for collecting and analyzing user responses in real time through interaction with home automated devices, generating scores, and providing feedback.

[0829] (Claim 2)

[0830] The system according to claim 1, further comprising a function to generate a bias-free score by detecting keywords and phrases related to interview evaluation metrics.

[0831] (Claim 3)

[0832] The evaluation report includes visual information to facilitate the interviewer's understanding of the evaluation results.

[0833] The system according to claim 1, further comprising means for the household automated device to provide voice feedback.

[0834] "Example 2 of combining an emotion engine"

[0835] (Claim 1)

[0836] A means of managing negotiation records by acquiring and storing audio or video information,

[0837] The acquired information is analyzed and used as a means to identify the emotional state using an emotion analysis engine.

[0838] A means for integrating the identified emotional state with existing evaluation indicators to generate a comprehensive score,

[0839] A means for creating a negotiation evaluation report based on the generated overall score and analysis results,

[0840] A system that includes means of providing the aforementioned evaluation report to the negotiator to support the final decision.

[0841] (Claim 2)

[0842] The system according to claim 1, further comprising a function to generate a bias-free score by detecting symbols and expressions related to the evaluation metrics of an interview.

[0843] (Claim 3)

[0844] The system according to claim 1, further comprising means for including visual information in the evaluation report so that negotiators can easily understand the evaluation results.

[0845] "Application example 2 when combining with an emotional engine"

[0846] (Claim 1)

[0847] A means for receiving and storing audio or visual data and managing the history of interactions,

[0848] A means for analyzing the received data, performing an analysis based on specific evaluation criteria, and generating a response corresponding to the user's emotional state,

[0849] A means for creating a dialogue report and supporting decision-making based on the generated response and its basis,

[0850] A system that includes means for presenting the aforementioned dialogue report to the user and providing additional information.

[0851] (Claim 2)

[0852] The system according to claim 1, further comprising a function to generate a biased response by detecting keywords or phrases associated with specific response indicators.

[0853] (Claim 3)

[0854] The system according to claim 1, further comprising means for including visual information in the dialogue report to enable the user to easily understand the situation. [Explanation of symbols]

[0855] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of managing interview records by receiving and storing audio or text data, The means for analyzing the received data and generating a score based on a specific evaluation index, A means for creating an interview evaluation report based on the generated score and its basis, The aforementioned evaluation report is provided to the interviewer as a means to support their final decision, A system that includes means for collecting and analyzing user responses in real time, generating scores, and providing feedback through interaction with home automated devices.

2. The system according to claim 1, further comprising a function to generate a bias-free score by detecting keywords and phrases related to the evaluation metrics of an interview.

3. The evaluation report includes visual information to facilitate the interviewer's understanding of the evaluation results. The system according to claim 1, further comprising means for the household automated device to provide voice feedback.

Citation Information

Patent Citations

Persona chatbot control method and system
JP2022180282A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Persona chatbot control method and system