system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system addresses the inefficiencies in data entry by converting audio to text, generating meeting minutes, and refining user notes, thereby improving the accuracy and completeness of customer management data entry.

JP2026096617APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

Application Information

Patent Timeline

03 Dec 2024

Application

15 Jun 2026

Publication

JP2026096617A

IPC: G06F40/56; G10L15/10; G10L15/00; G10L15/20

AI Tagging

Application Domain

Natural language translation Speech recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

The data entry work into customer management systems during business negotiations is cumbersome, leading to low input rates, accuracy issues, and a risk of information leakage, which affects efficiency and productivity.

Method used

A system that uses speech recognition to convert audio data from negotiations into text, generates meeting minutes using a generative model, and automatically inputs these minutes into a customer management system, with subsequent analysis of user notes to refine the data.

Benefits of technology

This system significantly enhances the efficiency and accuracy of data entry, ensuring all important negotiation points are recorded without omission and facilitating better customer information management.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096617000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] A speech recognition means that acquires audio data during a business negotiation and converts the audio data into text data, A memo generation means that automatically generates meeting minutes from the text data using a generative model, A data entry means for automatically inputting the aforementioned meeting minutes into a customer management system, An information analysis tool that analyzes simple notes entered after a business meeting and converts them into detailed data, A data update means for inputting the detailed data into the customer management system, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including the steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of the chatbot's character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance that responds to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In business activities, there is a problem that the data entry work into the customer management system is cumbersome. As a result, the input rate is low and it is difficult to maintain accuracy. In addition, due to the incomplete recording of the negotiation content, there is a risk of leakage of important information, which causes damage to the efficiency and productivity of the negotiation. There is a need for means to solve such problems and accurately and efficiently record and manage the negotiation content.

Means for Solving the Problems

[0005] This invention includes a speech recognition means that acquires audio data during a business negotiation and converts it into text data using speech recognition. Furthermore, it includes a memo generation means that automatically generates meeting minutes from the text data using a generative model. These meeting minutes are automatically entered into a customer management system. Subsequently, simple notes entered by the sales representative after the negotiation are analyzed into detailed data by an information analysis means and entered into the customer management system. This invention improves the efficiency and accuracy of data entry work, making it possible to record all important points of a business negotiation without omission.

[0006] "Audio data" refers to audio information acquired during business negotiations, and includes sound waveforms recorded in digital format.

[0007] "Speech recognition means" refers to a device or system that receives speech data as input and converts that speech into text data.

[0008] "Text data" refers to information in written form converted from audio data by speech recognition technology, and is represented as a string of characters.

[0009] A "generative model" refers to an artificial intelligence algorithm or machine learning model used to automatically generate text or notes based on input data.

[0010] "Memorandum generation means" refers to a device or system that generates meeting minutes from text data using a generation model.

[0011] "Meeting minutes" refers to text information that summarizes and records the content of a business negotiation, saved in a format that is easy to use later.

[0012] A "customer management system" is a digital system used to manage the content and relationships of business negotiations with customers, and it stores and manages customer information and details of business negotiations.

[0013] An "information analysis tool" is a device or system that analyzes input memos and converts them into more detailed data or information.

[0014] "Detailed data" refers to broader or deeper data obtained from notes analyzed using information analysis tools.

[0015] "Data input means" refers to a device or method for registering or recording generated data in a digital system or database. [Brief explanation of the drawing]

[0016] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when combined with an emotion engine. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when combined with an emotion engine.

Mode for Carrying Out the Invention

[0017] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0018] First, the language used in the following description will be explained.

[0019] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0020] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0021] In the following embodiments, the signed storage is one or more non-volatile storage devices that store various programs and various parameters. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0022] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0023] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0024] [First Embodiment]

[0025] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0026] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0027] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0028] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0029] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0030] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0031] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0032] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0033] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0034] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0035] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0036] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0037] This invention relates to a system that converts audio data acquired during business negotiations into text in real time, automatically creates meeting minutes using a generative model, and records them in a customer management system. This system also includes inputting simple notes after the negotiation and analyzing them into detailed data to reflect in the system.

[0038] The device uses the microphone to acquire audio data during business negotiations. This audio data is processed as a digital audio file containing statements and conversations made during the negotiation.

[0039] The server receives this audio data and activates the speech recognition engine to convert it into text data. This process is performed in real time, and pre-processing such as noise reduction and volume optimization is carried out.

[0040] This text data is automatically organized and summarized as meeting minutes by a generation AI model on the server. The generated meeting minutes are formatted to be easy to understand and review by all parties involved later.

[0041] The server then automatically inputs these meeting minutes into the customer management system, accurately recording the details of the business negotiation within the system. This makes it easier to later understand the details of the negotiation and plan the next steps.

[0042] After a business meeting concludes, users enter short notes or keywords on their device. This information is used to record new discoveries, important points, and tasks for the next meeting.

[0043] The entered memos are then analyzed in detail by the server's information analysis system, refining the relevant information and registering it in the customer management system. This ensures that the accuracy and reliability of the information remain high even after the business negotiation, and that all information is managed without omission.

[0044] For example, if a user enters a memo such as "The next meeting is at the end of September. The customer has shown interest in product C," the server interprets this as detailed data such as "Next meeting scheduled for the end of September" and "Interested in product C," and registers it in the system.

[0045] This invention dramatically improves the efficiency of data entry work in the sales department and enhances the quality of customer information management by highly automating the recording of business negotiations.

[0046] The following describes the processing flow.

[0047] Step 1:

[0048] The device captures audio data via the microphone during business negotiations. The audio data is temporarily stored in a buffer in digital format and sent to the server in real time.

[0049] Step 2:

[0050] The server receives audio data from the terminal and starts the speech recognition engine. It then performs preprocessing on the received audio data, such as noise reduction and volume normalization, to improve recognition accuracy.

[0051] Step 3:

[0052] The server uses a speech recognition engine to convert pre-processed audio data into text data. This text data contains the content of the business negotiation conversation as written information.

[0053] Step 4:

[0054] The server uses a generation AI model to automatically generate meeting minutes based on the acquired text data. These generated meeting minutes are formatted to summarize the business discussion and clearly express the key points.

[0055] Step 5:

[0056] The server immediately inputs the automatically generated meeting minutes into the customer management system. It connects to the customer management system and creates and records new data entries.

[0057] Step 6:

[0058] After a business meeting concludes, users enter brief notes or keywords related to the meeting via their device. This includes important information such as the meeting's outcome and the next steps.

[0059] Step 7:

[0060] The server receives notes sent by users and analyzes them into detailed data using information analysis tools. This analysis is performed using techniques such as keyword extraction and contextual understanding.

[0061] Step 8:

[0062] The server inputs and updates the detailed data obtained through analysis into the customer management system. This ensures that information after a business negotiation is accurately managed and utilized in subsequent sales activities.

[0063] (Example 1)

[0064] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0065] In today's business environment, accurately recording the details of business negotiations and using that information to improve subsequent customer interactions is crucial. However, manually taking notes on the vast amount of information gathered during negotiations and entering it into a customer information management system is extremely time-consuming and labor-intensive. Furthermore, information is easily omitted or inaccurate, making efficient and accurate negotiation follow-up difficult.

[0066] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0067] In this invention, the server includes an audio processing means for acquiring audio information during negotiations and converting the audio information into text information, an information generation means for automatically generating meeting minutes from the text information using a specified generation format, and a data registration means for automatically inputting the meeting minutes into a customer information management system. This makes it possible to record information during business negotiations in real time.

[0068] "Audio information during negotiations" refers to audio data that records the content of statements and conversations made during business negotiations or discussions.

[0069] "Speech processing means" refers to a device or program that has a series of functions for acquiring, analyzing, and converting speech information in digital format.

[0070] "Text information" refers to digital data expressed as a string of characters after audio information has been converted.

[0071] A "generative format" refers to a model or algorithm that uses artificial intelligence to automatically generate information in a specific format.

[0072] "Information generation means" refers to devices or programs that use a generation format to summarize or generate necessary information from text information.

[0073] "Meeting minutes" refers to a document that summarizes the content of a meeting or business negotiation, and is a record used to review the content later.

[0074] A "customer information management system" is an information system that manages customer data and transaction history, and uses this information to build relationships with customers and improve business operations.

[0075] A "data registration means" refers to a device or program that has the function of automatically inputting and saving generated information into a specific database or system.

[0076] "Information analysis means" refers to devices or programs that analyze input information and extract or refine necessary information.

[0077] This invention aims to efficiently manage and utilize audio information during business negotiations. This system provides advanced technology for automatically converting audio information acquired during negotiations into text information and further generating it as meeting minutes.

[0078] The terminal collects audio information using its microphone during business negotiations. This terminal is used to transmit the content of the negotiations and conversations, sending the acquired audio data to a server. Ideally, the terminal should be equipped with a high-quality microphone and an audio collection application.

[0079] The server converts the received audio information into text using existing technologies such as Google® Speech-to-Text API and IBM Watson® Speech to Text as speech recognition engines. This process includes noise reduction and volume adjustment. Furthermore, the server inputs this text information into a generation AI model (e.g., OpenAI® GPT-3®) and automatically generates meeting minutes using prompt phrases. This generation process enables accurate tracking of the negotiation's progress and outputs the results in a user-friendly format for stakeholders. An example of a prompt phrase is, "Generate a summary of the negotiation and list the key points."

[0080] The generated meeting minutes are automatically entered into the customer information management system by the server. This ensures that the details of the business negotiations are recorded immediately and can be easily referenced when taking subsequent actions.

[0081] After a business meeting concludes, users use their devices to input brief notes about any new observations or next steps. This information is sent to a server for later use and analyzed by data analysis tools. Users can use a dedicated application on their devices to input notes and keywords.

[0082] The analyzed information is registered in the customer information management system by the server, and refined data representing the results of the business negotiations is stored there. In this way, the system plays a role in supporting the efficiency and accuracy of business negotiation activities.

[0083] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0084] Step 1:

[0085] The device acquires audio information using a microphone during business negotiations. Specifically, the device captures audio data in real time through a built-in or connected microphone and saves it as a digital audio file. The input to this process is the raw audio of what is said and talked during the negotiation, and the output is the saved digital audio data.

[0086] Step 2:

[0087] The terminal transmits the acquired audio information to the server via a secure connection. The terminal encrypts the audio data and uploads it to the server via the internet. The input is the stored audio data, and the output is the audio file uploaded to the server.

[0088] Step 3:

[0089] The server passes the received audio data to the speech recognition engine, which converts the audio into text. Specifically, the server performs pre-processing such as noise reduction and volume adjustment, and then uses speech recognition software to convert it into text. The input to this process is audio data, and the output is the converted text information.

[0090] Step 4:

[0091] The server inputs text information into an AI model to generate meeting minutes. Here, the AI model is instructed using the prompt "Generate a summary of the business meeting and list the key points." The input consists of text information and the prompt, and the output is a summarized meeting minute.

[0092] Step 5:

[0093] The server automatically registers the generated meeting minutes into the customer information management system. Specifically, the server converts the meeting minutes into a predetermined data format and automatically transfers them to the database. The input is the generated meeting minutes, and the output is the data registered in the customer information management system.

[0094] Step 6:

[0095] After the business meeting concludes, the user enters brief notes or keywords using a terminal. The user uses a designated application to input text, which is then sent from the terminal to the server. The input is manually entered text by the user, and the output is the memo data sent to the server.

[0096] Step 7:

[0097] The server analyzes notes sent by users using information analysis tools and generates detailed data. Specifically, it uses natural language processing techniques to analyze text and extract and refine relevant information. The input is the submitted note data, and the output is the detailed data after analysis.

[0098] Step 8:

[0099] The server registers the analyzed detailed data into the customer information management system. This ensures that detailed and accurate information obtained from business negotiations is added to the database. The input is the analyzed detailed data, and the output is the updated data within the system.

[0100] (Application Example 1)

[0101] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0102] During business negotiations, it is crucial for sales representatives to accurately record conversations with customers so that they can easily access the information later. However, depending on the speed and complexity of the conversation, recording can be time-consuming and carries the risk of memory lapses or omissions. Furthermore, manually gathering and organizing large amounts of information is often burdensome. In addition, there is the challenge of not being able to easily access important information when it is needed immediately during a negotiation.

[0103] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0104] In this invention, the server includes a speech recognition device that acquires voice data and converts the voice data into text information; an information generation device that automatically generates summary information from the text information using a generation model; a data recording device that automatically inputs the summary information into a service management system; an information processing device that analyzes simple information entered after a business negotiation and converts it into detailed information; and a display device that uses smart glasses to display important information in real time during a business negotiation. This automates information processing during and after business negotiations, making it possible to manage the content of business negotiations accurately and quickly.

[0105] "Audio data" refers to information recorded in digital format from human speech acquired during business negotiations.

[0106] "Textual information" refers to information in text format that has been converted from audio data by a speech recognition device.

[0107] A "speech recognition device" is a device or system that has the function of analyzing speech data and converting its content into text information.

[0108] A "generative model" is a machine learning model that learns from vast amounts of data and generates summary information and other data based on the input information.

[0109] "Summary information" is a concise content record that is automatically generated by extracting textual information using a generative model.

[0110] An "information generation device" is a device or system that generates summarized information from textual information using a generation model.

[0111] A "data recording device" is a device or system that has the function of automatically inputting summary information into a service management system.

[0112] An "information processing device" is a device or system that analyzes simple information entered after a business negotiation and converts it into detailed information.

[0113] A "service management system" is a digital platform for centrally managing information related to customers and business deals.

[0114] "Smart glasses" are wearable devices that, when worn by a user, display digital information in their field of vision in real time.

[0115] A "display device" is a device or system that uses smart glasses to visually present important information in real time during business negotiations.

[0116] The system for carrying out this invention mainly consists of an integrated configuration of a speech recognition device, an information generation device, a data recording device, an information processing device, and a display device. The server acquires input via smart glasses worn by the user to collect voice data during business negotiations. These glasses are equipped with a highly sensitive microphone that records conversations in real time. The recorded voice data is first converted into text information by the speech recognition device. In this process, pre-processing is performed to remove noise and adjust the volume, so that accurate text information is generated.

[0117] Once text information is generated, the server's information generation device uses a generation AI model to automatically generate summary information from this text. This summary information concisely summarizes the main points of the business negotiation and is recorded in the service management system via a data recording device. After the negotiation, the user can add simple notes by voice using smart glasses. This voice information is analyzed by the information processing device and converted into more detailed information. This detailed information is also updated and recorded in the service management system.

[0118] For example, if a user says during a business meeting, "The customer showed interest in new product A," the speech recognition device converts this into text, and the information generation device generates a summary such as "The customer showed interest in new product A." If the user adds a note after the meeting, such as "I would like the next meeting to be next Tuesday," the information processing device converts this information into detailed information, such as "Next meeting scheduled for: next Tuesday," and records it.

[0119] As an example of a prompt sentence used in a generative AI model, using the sentence, "Please briefly summarize the details of this business opportunity. The customer expressed interest in new product A," will enable a highly accurate summary.

[0120] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0121] Step 1:

[0122] The terminal acquires audio data using the microphone in smart glasses during business negotiations. This audio data is saved as a digital audio file recording the conversation between the user and the customer. This audio data serves as input to the system.

[0123] Step 2:

[0124] The server activates the speech recognition device and converts the acquired speech data into text information. The speech recognition device then performs preprocessing, including interference cancellation and volume adjustment. This results in the output of text information with less noise. This generated text information becomes the input for the next processing step.

[0125] Step 3:

[0126] The server's information generation device uses a generation AI model to generate summary information from the text information generated in step 2. Here, the instruction "Generate a summary of the conversation" is used as the prompt. This text information is the input, and the summary information is the output.

[0127] Step 4:

[0128] The server's data recording device automatically inputs the generated summary information into the service management system. This process saves the summary information as a sales opportunity record. Here, data is registered in the service management system, and the registered information is then passed on to the next process.

[0129] Step 5:

[0130] After a business meeting, the user inputs a simple memo via voice using smart glasses. This voice input becomes a new input for the information processing device. This memo includes next actions and findings.

[0131] Step 6:

[0132] The server's information processing unit analyzes user input notes and converts them into more detailed information. This generated detailed information is then saved to the service management system.

[0133] Step 7:

[0134] The device displays important information in real time within the user's field of view through smart glasses during business negotiations. This allows users to instantly check the information they need during the negotiation. Real-time information display assists the negotiation and strengthens the connection to the next action.

[0135] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0136] This invention relates to a system that acquires audio data during business negotiations and generates text data and emotion data using speech recognition and emotion recognition technologies. This enables the automatic generation of detailed meeting minutes that include not only the content of the negotiation but also the emotional reactions during the negotiation. Furthermore, by inputting this information into a customer management system, deeper analysis of the negotiation becomes possible.

[0137] The device uses a microphone to acquire audio data during business negotiations. This audio data includes not only the content of what was said during the negotiations but also the customer's emotional reactions.

[0138] The server receives audio data sent from the terminal. It starts the speech recognition engine and converts the audio data into text data. During this process, pre-processing such as noise reduction and volume adjustment is performed to improve recognition accuracy.

[0139] The server simultaneously uses an emotion engine to recognize the user's emotions from the voice data. The emotion engine analyzes the tone, speed, and other vocal characteristics of the voice to detect the user's emotional state.

[0140] These text and sentiment data are automatically generated as meeting minutes by a generative AI model. The meeting minutes include text information about the business negotiation, as well as the customer's emotional reactions, and are presented in a way that allows for an understanding of the psychological dynamics during the negotiation.

[0141] The server automatically inputs meeting minutes and sentiment data into the customer management system. This data is recorded as an evaluation metric for each sales negotiation and used for post-negotiation analysis and strategic planning for future negotiations.

[0142] After a business meeting concludes, the user enters supplementary information and notes regarding the next action plan via their terminal. These notes are then converted into detailed data using an information analysis tool and registered in the system.

[0143] For example, if a user inputs "The customer was very positive about product D, but expressed concerns about the price," the server will consider this and register the information that the customer is positive about the product but cautious about the price in the customer management system.

[0144] This invention aims to improve the efficiency of sales activities and deepen customer understanding by comprehensively addressing both verbal and nonverbal elements in business negotiations.

[0145] The following describes the processing flow.

[0146] Step 1:

[0147] The device uses the microphone to capture audio data during the business negotiation. The acquired audio data is temporarily stored as a composite audio file that includes the content of the negotiation and the customer's emotional responses.

[0148] Step 2:

[0149] The server receives audio data sent from the terminal. First, it uses a speech recognition engine to convert the audio data into text data. During this process, preprocessing such as noise reduction and volume normalization is performed to improve recognition accuracy.

[0150] Step 3:

[0151] The server simultaneously activates the emotion engine. It analyzes the tone, speed, and intonation of the voice data to recognize the user's emotional state. The recognized emotion data is assigned labels such as "joy," "anxiety," and "excitement."

[0152] Step 4:

[0153] The server inputs text and sentiment data into a generating AI model, automatically creating meeting minutes that integrate the sales discussion content with the user's emotions. The meeting minutes also include the customer's emotional responses, clearly recording how the sales discussion was influenced by emotions.

[0154] Step 5:

[0155] The server inputs the generated meeting minutes and sentiment data into the customer management system. The text content and sentiment responses of sales meetings are centrally managed, providing extremely useful data for future analysis and strategy.

[0156] Step 6:

[0157] After a business meeting concludes, users enter supplementary notes via their terminal. Examples of such notes include additional information about the meeting, next steps, and any special points to note. This information plays a crucial role in subsequent analysis.

[0158] Step 7:

[0159] The server analyzes the additional notes received from the user and converts them into detailed data using information analysis tools. This detailed data is also updated in the customer management system and stored as part of the sales opportunity information.

[0160] Throughout this process, business negotiations are meticulously recorded based on verbal and nonverbal elements, and provided as a dataset that is useful for making important decisions in sales activities.

[0161] (Example 2)

[0162] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0163] Understanding the content of business negotiations and visualizing the customer's psychological state are crucial in many business processes. However, traditional methods only involve transcribing audio data into text, making it difficult to grasp emotional responses. Furthermore, there is a need for a method that efficiently incorporates supplementary information after business negotiations into management systems.

[0164] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0165] In this invention, the server includes an acoustic recognition means, a psychological recognition means, and a record generation means. This enables the generation of detailed records of the business negotiation based on acoustic data during the negotiation, real-time understanding of the customer's psychological reactions, and efficient registration of information into the management system.

[0166] A "business negotiation" is the process of discussing the terms and conditions of a business transaction or contract.

[0167] "Audio data" refers to the digital or analog representation of sound signals generated during a business negotiation.

[0168] "Acoustic recognition means" refers to technologies and devices that analyze acoustic data and convert it into text data.

[0169] "Psychological recognition means" refers to technologies and devices that analyze and identify a speaker's emotions and psychological state from acoustic data.

[0170] "Text data" refers to audio data that represents the content of the audio in text format.

[0171] "Record generation means" refers to technologies and devices that generate detailed conversation records of business negotiations based on text data and psychological data.

[0172] A "management system" is a database or software that systematically organizes data related to business deals, making it easy to access and update.

[0173] "Information input means" refers to technologies and devices for registering generated data into a management system.

[0174] "Data analysis means" refers to technologies and devices that analyze information entered after a business negotiation and convert it into more detailed information.

[0175] "Information update means" refers to technologies and devices that reflect analyzed information in a management system and maintain the data in an up-to-date state.

[0176] This invention is a system that efficiently acquires acoustic data during business negotiations and analyzes and records the content of the negotiations and the customer's psychological state. This system technically processes important data to deeply understand the content of the negotiations and support sales activities.

[0177] The device uses a high-sensitivity microphone to acquire acoustic data during business negotiations. This acoustic data can capture not only the content of what the speaker says during the negotiation, but also their emotional reactions. The acquired acoustic data is transmitted to the server in real time.

[0178] The server performs acoustic recognition based on the received acoustic data and converts it into text data. Acoustic recognition software is used for this, and the accuracy of the conversion is improved by performing noise reduction and volume adjustment as preprocessing. In parallel, the server uses psychocognition technology to analyze and identify the speaker's emotions received from the acoustic data. Psychocognition software has the ability to analyze the tone, speed, and characteristics of the voice and recognize the speaker's psychological state.

[0179] Subsequently, the server uses a generative AI model to generate a conversation record based on text data and psychological data. This conversation record includes the details of the business negotiation and the customer's emotional response, making it a valuable source of information for understanding the psychological dynamics of the business.

[0180] As a concrete example, the prompt message for the generating AI model takes the form of "Based on the acoustic data, generate a detailed record of the business negotiation, including the speaker's emotional responses." The generated conversation record is registered in the management system via the server. This records it as a history of the business negotiation, providing data useful for future analysis and strategic planning of sales activities.

[0181] After a business negotiation concludes, users use their terminals to input additional information related to the negotiation and their next action plan. This information is then transformed into detailed data through data analysis functions and reflected in the management system, where it is utilized as an integrated database.

[0182] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0183] Step 1:

[0184] The device acquires acoustic data during business negotiations using a high-sensitivity microphone. The input is the speech of the negotiation participants, and the output is digital acoustic data. The device is designed to collect this digital acoustic data in real time during the negotiation.

[0185] Step 2:

[0186] The terminal compresses the acquired acoustic data and sends it to the server in a secure manner. The input is the acoustic data obtained in step 1, and the output is the compressed data ready for transfer. This process employs an efficient transfer protocol to minimize data delay.

[0187] Step 3:

[0188] The server receives the audio data and begins processing for sound recognition. The input is audio data sent from the terminal, and the output is text data. The server first performs noise reduction and volume adjustment, and then the speech recognition engine converts the data into text format.

[0189] Step 4:

[0190] The server uses psychocognition technology to evaluate the speaker's psychological state from acoustic data. The input is acoustic data, and the output is data about the speaker's psychological state. The server analyzes the tone, speed, and other vocal characteristics of the voice, and applies emotion recognition algorithms to identify the speaker's psychological state.

[0191] Step 5:

[0192] The server utilizes a generative AI model to generate a conversation record from text data and psychological state data. The input is the data obtained in steps 3 and 4, and the output is a detailed conversation record. The generative AI model is prompted with the message, "Generate a detailed record of the business negotiation based on the acoustic data, including the speaker's emotional responses," to generate the conversation record.

[0193] Step 6:

[0194] The server registers the generated conversation records in the management system. The input is the conversation records obtained in step 5, and the output is the historical information registered in the management system. Through this process, the details of the business negotiations are recorded in an integrated database, which can be used for future analysis and strategic planning.

[0195] Step 7:

[0196] After a business negotiation is completed, the user enters additional information related to the negotiation and their next action plan via a terminal. The input is manual by the user, while the output is updated information based on detailed data analysis. The terminal uploads the input data to the system, converts it into detailed data using the analysis function, and registers it in the management system.

[0197] (Application Example 2)

[0198] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0199] Conventional sales negotiation support systems simply convert speech to text, failing to consider the emotional responses of the participants. This makes it difficult to understand their emotional state and psychological state. Furthermore, the inability to provide emotion-based interactions hinders improvements in the user experience.

[0200] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0201] In this invention, the server includes a speech recognition means for acquiring audio data during a business negotiation and converting the audio data into text data, an emotion recognition means for deriving emotion data from the audio data, and a memo generation means for automatically generating meeting minutes from the text data and emotion data using a generative model. This makes it possible to generate meeting minutes that reflect not only the content of the business negotiation but also the emotional trends of the participants in real time, and to provide user-based interaction.

[0202] "Voice recognition means" refers to technology for converting voice data acquired during business negotiations into text data.

[0203] "Emotion recognition means" refers to a technology that analyzes the characteristics of a voice from acquired audio data and derives emotion data.

[0204] "Memory generation method" refers to a technology that uses a generative model to combine text data and sentiment data to automatically generate meeting minutes.

[0205] "Data entry means" refers to technology for inputting automatically generated meeting minutes into a customer management system.

[0206] "Information analysis means" refers to technology for analyzing simple notes entered after a business negotiation and converting them into detailed data.

[0207] "Data update means" refers to the technology for inputting the analyzed detailed data into the customer management system.

[0208] "Interaction generation means" refers to technologies for generating appropriate suggestions and responses based on the user's emotions.

[0209] In order to carry out this invention, a system having the following configuration is required.

[0210] First, the device continuously acquires participant audio data using the microphone during the business negotiation. This audio data includes both the content of the negotiation and the emotional responses of the participants. The device then transmits this audio data to the server.

[0211] Next, the server applies a speech recognition engine to the received audio data to convert it into text data. During this process, pre-processing such as noise reduction and volume adjustment is performed to improve accuracy. For example, Google Cloud Speech-to-Text can be used as the speech recognition engine.

[0212] Furthermore, the server activates an emotion recognition engine to analyze the tone, speed, and other vocal characteristics of the voice to derive emotion data. Emotion recognition technologies such as the Affectiva SDK can be used in this process.

[0213] The server automatically generates meeting minutes by combining text data and sentiment data acquired using a generative AI model. These minutes include not only the details of the business negotiations that should be entered into the customer management system, but also the emotional responses of the participants.

[0214] Furthermore, the server analyzes the simple notes entered after the business negotiation, converts them into detailed data using information analysis tools, and inputs the results into the customer management system.

[0215] The interaction generation system generates appropriate suggestions based on the user's emotional data, thereby personalizing interactions in business negotiations and conversations and contributing to an improved user experience.

[0216] For example, if the system detects signs of stress from the user's voice during a business negotiation, it can suggest, "Shall we take a short break?" An example of a prompt for generating this interaction is, "Identify the emotions the user is feeling and create a suggestion accordingly."

[0217] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0218] Step 1:

[0219] The terminal uses a microphone to capture the voices of participants during a business negotiation. The input is audio data of the negotiation. This data is then sent directly to the server.

[0220] Step 2:

[0221] The server applies a speech recognition engine to the received audio data. Specifically, it performs noise reduction and volume adjustment, and then converts the audio into text data. The output of this process is text data that reflects the content of the business negotiation.

[0222] Step 3:

[0223] The server activates an emotion recognition engine to extract emotion data from the audio data. It analyzes the tone, speed, and vocal characteristics of the input audio to detect the user's emotional state (e.g., joy, surprise, anxiety). The output is emotion data.

[0224] Step 4:

[0225] The server uses a generative AI model to integrate text and sentiment data and automatically generate meeting minutes for business negotiations. The input data includes both text and sentiment, and the output is in the form of meeting minutes.

[0226] Step 5:

[0227] The server utilizes user sentiment data to generate appropriate suggestions using interaction generation tools. The input in this process is sentiment data, and the output is interaction suggestions for the user.

[0228] Step 6:

[0229] The user receives interactions generated by the system and provides responses and feedback as needed. The input here is the interaction suggestion, and the output is the user's feedback.

[0230] Step 7:

[0231] The server receives a brief memo entered by the user after a business negotiation. Using information analysis tools, it processes this memo into detailed data. The input is a brief memo, and the output is refined, detailed business data.

[0232] Step 8:

[0233] The server inputs detailed data into the customer management device and updates the system-wide database. In this step, the updated data is output.

[0234] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0235] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0236] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0237] [Second Embodiment]

[0238] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0239] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0240] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0241] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0242] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0243] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0244] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0245] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0246] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0247] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0248] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0249] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0250] This invention relates to a system that converts audio data acquired during business negotiations into text in real time, automatically creates meeting minutes using a generative model, and records them in a customer management system. This system also includes inputting simple notes after the negotiation and analyzing them into detailed data to reflect in the system.

[0251] The device uses the microphone to acquire audio data during business negotiations. This audio data is processed as a digital audio file containing statements and conversations made during the negotiation.

[0252] The server receives this audio data and activates the speech recognition engine to convert it into text data. This process is performed in real time, and pre-processing such as noise reduction and volume optimization is carried out.

[0253] This text data is automatically organized and summarized as meeting minutes by a generation AI model on the server. The generated meeting minutes are formatted to be easy to understand and review by all parties involved later.

[0254] The server then automatically inputs these meeting minutes into the customer management system, accurately recording the details of the business negotiation within the system. This makes it easier to later understand the details of the negotiation and plan the next steps.

[0255] After a business meeting concludes, users enter short notes or keywords on their device. This information is used to record new discoveries, important points, and tasks for the next meeting.

[0256] The entered memos are then analyzed in detail by the server's information analysis system, refining the relevant information and registering it in the customer management system. This ensures that the accuracy and reliability of the information remain high even after the business negotiation, and that all information is managed without omission.

[0257] For example, if a user enters a memo such as "The next meeting is at the end of September. The customer has shown interest in product C," the server interprets this as detailed data such as "Next meeting scheduled for the end of September" and "Interested in product C," and registers it in the system.

[0258] This invention dramatically improves the efficiency of data entry work in the sales department and enhances the quality of customer information management by highly automating the recording of business negotiations.

[0259] The following describes the processing flow.

[0260] Step 1:

[0261] The device captures audio data via the microphone during business negotiations. The audio data is temporarily stored in a buffer in digital format and sent to the server in real time.

[0262] Step 2:

[0263] The server receives audio data from the terminal and starts the speech recognition engine. It then performs preprocessing on the received audio data, such as noise reduction and volume normalization, to improve recognition accuracy.

[0264] Step 3:

[0265] The server uses a speech recognition engine to convert pre-processed audio data into text data. This text data contains the content of the business negotiation conversation as written information.

[0266] Step 4:

[0267] The server uses a generation AI model to automatically generate meeting minutes based on the acquired text data. These generated meeting minutes are formatted to summarize the business discussion and clearly express the key points.

[0268] Step 5:

[0269] The server instantly inputs the automatically generated meeting minutes into the customer management system. It connects to the customer management system and creates and records new data entries.

[0270] Step 6:

[0271] After a business meeting concludes, users enter brief notes or keywords related to the meeting via their device. This includes important information such as the meeting's outcome and the next steps.

[0272] Step 7:

[0273] The server receives notes sent by users and analyzes them into detailed data using information analysis tools. This analysis is performed using techniques such as keyword extraction and contextual understanding.

[0274] Step 8:

[0275] The server inputs and updates the detailed data obtained through analysis into the customer management system. This ensures that information after a business negotiation is accurately managed and utilized in subsequent sales activities.

[0276] (Example 1)

[0277] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0278] In today's business environment, accurately recording the details of business negotiations and using that information to improve subsequent customer interactions is crucial. However, manually taking notes on the vast amount of information gathered during negotiations and entering it into a customer information management system is extremely time-consuming and labor-intensive. Furthermore, information is easily omitted or inaccurate, making efficient and accurate negotiation follow-up difficult.

[0279] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0280] In this invention, the server includes an audio processing means for acquiring audio information during negotiations and converting the audio information into text information, an information generation means for automatically generating meeting minutes from the text information using a specified generation format, and a data registration means for automatically inputting the meeting minutes into a customer information management system. This makes it possible to record information during business negotiations in real time.

[0281] "Audio information during negotiations" refers to audio data that records the content of statements and conversations made during business negotiations or discussions.

[0282] The "voice processing means" refers to a device or program that has a series of functions to acquire voice information in digital form and perform analysis and conversion.

[0283] The "text information" refers to digital data expressed as a character string after the voice information is converted.

[0284] The "generation format" refers to a model or algorithm for automatically generating information in a specific format using artificial intelligence.

[0285] The "information generation means" refers to a device or program for summarizing or generating necessary information from text information using the generation format.

[0286] The "meeting memo" refers to a document that summarizes the content of a meeting or business negotiation and is a record used to confirm the content later.

[0287] The "customer information management system" refers to an information system for managing customer data and transaction history and for building relationships with customers and using them in business.

[0288] The "data registration means" refers to a device or program having a function to automatically input and store the generated information in a specific database or system.

[0289] The "information analysis means" refers to a device or program for analyzing the input information and extracting or refining necessary information.

[0290] The present invention aims at efficient management and utilization of voice information during business negotiations. This system provides an advanced technology for automatically converting voice information acquired during business negotiations into text information and further generating it as a meeting memo.

[0291] The terminal collects audio information using its microphone during business negotiations. This terminal is used to transmit the content of the negotiations and conversations, sending the acquired audio data to a server. Ideally, the terminal should be equipped with a high-quality microphone and an audio collection application.

[0292] The server converts the received audio information into text using existing technologies such as the Google Speech-to-Text API or IBM Watson Speech to Text as its speech recognition engine. This process includes noise reduction and volume adjustment. Furthermore, the server inputs this text information into a generation AI model (e.g., OpenAI GPT-3) and automatically generates meeting minutes using prompts. This generation process enables accurate tracking of the negotiation's progress and outputs the results in a user-friendly format for stakeholders. An example of a prompt is, "Generate a summary of the negotiation and list the key points."

[0293] The generated meeting minutes are automatically entered into the customer information management system by the server. This ensures that the details of the business negotiations are recorded immediately and can be easily referenced when taking subsequent actions.

[0294] After a business meeting concludes, users use their devices to input brief notes about any new observations or next steps. This information is sent to a server for later use and analyzed by data analysis tools. Users can use a dedicated application on their devices to input notes and keywords.

[0295] The analyzed information is registered in the customer information management system by the server, and refined data representing the results of the business negotiations is stored there. In this way, the system plays a role in supporting the efficiency and accuracy of business negotiation activities.

[0296] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0297] Step 1:

[0298] The device acquires audio information using a microphone during business negotiations. Specifically, the device captures audio data in real time through a built-in or connected microphone and saves it as a digital audio file. The input to this process is the raw audio of what is said and talked during the negotiation, and the output is the saved digital audio data.

[0299] Step 2:

[0300] The terminal transmits the acquired audio information to the server via a secure connection. The terminal encrypts the audio data and uploads it to the server via the internet. The input is the stored audio data, and the output is the audio file uploaded to the server.

[0301] Step 3:

[0302] The server passes the received audio data to the speech recognition engine, which converts the audio into text. Specifically, the server performs pre-processing such as noise reduction and volume adjustment, and then uses speech recognition software to convert it into text. The input to this process is audio data, and the output is the converted text information.

[0303] Step 4:

[0304] The server inputs text information into an AI model to generate meeting minutes. Here, the AI model is instructed using the prompt "Generate a summary of the business meeting and list the key points." The input consists of text information and the prompt, and the output is a summarized meeting minute.

[0305] Step 5:

[0306] The server automatically registers the generated meeting memo in the customer information management system. Specifically, the server converts the meeting memo into a predetermined data format and automatically transfers it to the database. The input is the generated meeting memo, and the output is the data registered in the customer information management system.

[0307] Step 6:

[0308] After the negotiation ends, the user uses the terminal to input simple memos and keywords. The user inputs text using the designated application, and it is sent from the terminal to the server. The input is the text manually input by the user, and the output is the memo data sent to the server.

[0309] Step 7:

[0310] The server analyzes the memo sent from the user using information analysis means and generates detailed data. Specifically, it uses natural language processing technology to analyze the text and extract and refine appropriate information. The input is the sent memo data, and the output is the detailed data after analysis.

[0311] Step 8:

[0312] The server registers the analyzed detailed data in the customer information management system. As a result, the detailed and accurate information obtained in the negotiation is added to the database. The input is the analyzed detailed data, and the output is the updated data in the system.

[0313] (Application Example 1)

[0314] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0315] During business negotiations, it is crucial for sales representatives to accurately record conversations with customers so that they can easily access the information later. However, depending on the speed and complexity of the conversation, recording can be time-consuming and carries the risk of memory lapses or omissions. Furthermore, manually gathering and organizing large amounts of information is often burdensome. In addition, there is the challenge of not being able to easily access important information when it is needed immediately during a negotiation.

[0316] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0317] In this invention, the server includes a speech recognition device that acquires voice data and converts the voice data into text information; an information generation device that automatically generates summary information from the text information using a generation model; a data recording device that automatically inputs the summary information into a service management system; an information processing device that analyzes simple information entered after a business negotiation and converts it into detailed information; and a display device that uses smart glasses to display important information in real time during a business negotiation. This automates information processing during and after business negotiations, making it possible to manage the content of business negotiations accurately and quickly.

[0318] "Audio data" refers to information recorded in digital format from human speech acquired during business negotiations.

[0319] "Textual information" refers to information in text format that has been converted from audio data by a speech recognition device.

[0320] A "speech recognition device" is a device or system that has the function of analyzing speech data and converting its content into text information.

[0321] A "generative model" is a machine learning model that learns from vast amounts of data and generates summary information and other data based on the input information.

[0322] "Summary information" is a concise content record that is automatically generated by extracting textual information using a generative model.

[0323] An "information generation device" is a device or system that generates summarized information from textual information using a generation model.

[0324] A "data recording device" is a device or system that has the function of automatically inputting summary information into a service management system.

[0325] An "information processing device" is a device or system that analyzes simple information entered after a business negotiation and converts it into detailed information.

[0326] A "service management system" is a digital platform for centrally managing information related to customers and business deals.

[0327] "Smart glasses" are wearable devices that, when worn by a user, display digital information in their field of vision in real time.

[0328] A "display device" is a device or system that uses smart glasses to visually present important information in real time during business negotiations.

[0329] The system for carrying out this invention mainly consists of an integrated configuration of a speech recognition device, an information generation device, a data recording device, an information processing device, and a display device. The server acquires input via smart glasses worn by the user to collect voice data during business negotiations. These glasses are equipped with a highly sensitive microphone that records conversations in real time. The recorded voice data is first converted into text information by the speech recognition device. In this process, pre-processing is performed to remove noise and adjust the volume, so that accurate text information is generated.

[0330] Once text information is generated, the server's information generation device uses a generation AI model to automatically generate summary information from this text. This summary information concisely summarizes the main points of the business negotiation and is recorded in the service management system via a data recording device. After the negotiation, the user can add simple notes by voice using smart glasses. This voice information is analyzed by the information processing device and converted into more detailed information. This detailed information is also updated and recorded in the service management system.

[0331] For example, if a user says during a business meeting, "The customer showed interest in new product A," the speech recognition device converts this into text, and the information generation device generates a summary such as "The customer showed interest in new product A." If the user adds a note after the meeting, such as "I would like the next meeting to be next Tuesday," the information processing device converts this information into detailed information, such as "Next meeting scheduled for: next Tuesday," and records it.

[0332] As an example of a prompt sentence used in a generative AI model, using the sentence, "Please briefly summarize the details of this business opportunity. The customer expressed interest in new product A," will enable a highly accurate summary.

[0333] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0334] Step 1:

[0335] The terminal uses the microphone in smart glasses to acquire audio data during business negotiations. This audio data is saved as a digital audio file recording the conversation between the user and the customer. This audio data serves as input to the system.

[0336] Step 2:

[0337] The server activates the speech recognition device and converts the acquired speech data into text information. The speech recognition device then performs preprocessing, including interference cancellation and volume adjustment. This results in the output of text information with less noise. This generated text information becomes the input for the next processing step.

[0338] Step 3:

[0339] The server's information generation device uses a generation AI model to generate summary information from the text information generated in step 2. Here, the prompt is "Generate a summary of the conversation." This text information is the input, and the summary information is the output.

[0340] Step 4:

[0341] The server's data recording device automatically inputs the generated summary information into the service management system. This process saves the summary information as a sales opportunity record. Here, data is registered in the service management system, and the registered information is then passed on to the next process.

[0342] Step 5:

[0343] After a business meeting, the user inputs a simple memo via voice using smart glasses. This voice input becomes new input for the information processing device. This memo includes next actions and findings.

[0344] Step 6:

[0345] The server's information processing unit analyzes user input notes and converts them into more detailed information. This generated detailed information is then saved to the service management system.

[0346] Step 7:

[0347] The device displays important information in real time within the user's field of view through smart glasses during business negotiations. This allows users to instantly check the information they need during the negotiation. Real-time information display assists the negotiation and strengthens the connection to the next action.

[0348] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0349] This invention relates to a system that acquires audio data during business negotiations and generates text data and emotion data using speech recognition and emotion recognition technologies. This enables the automatic generation of detailed meeting minutes that include not only the content of the negotiation but also the emotional reactions during the negotiation, and further input into a customer management system to enable deeper analysis of the negotiation.

[0350] The device uses a microphone to acquire audio data during business negotiations. This audio data includes not only the content of what was said during the negotiations but also the customer's emotional reactions.

[0351] The server receives the audio data sent from the terminal. It starts the speech recognition engine and converts the audio data into text data. During this process, pre-processing such as noise reduction and volume adjustment is performed to improve recognition accuracy.

[0352] The server simultaneously uses an emotion engine to recognize the user's emotions from the voice data. The emotion engine analyzes the tone, speed, and other vocal characteristics of the voice to detect the user's emotional state.

[0353] These text and sentiment data are automatically generated as meeting minutes by a generative AI model. The meeting minutes include text information about the business negotiation, as well as the customer's emotional reactions, and are presented in a way that allows for an understanding of the psychological dynamics during the negotiation.

[0354] The server automatically inputs meeting minutes and sentiment data into the customer management system. This data is recorded as an evaluation metric for each sales negotiation and used for post-negotiation analysis and strategic planning for future negotiations.

[0355] After a business meeting concludes, the user enters supplementary information and notes regarding the next action plan via their terminal. These notes are then converted into detailed data using an information analysis tool and registered in the system.

[0356] For example, if a user inputs "The customer was very positive about product D, but expressed concerns about the price," the server will consider this and register the information that the customer is positive about the product but cautious about the price in the customer management system.

[0357] This invention aims to improve the efficiency of sales activities and deepen customer understanding by comprehensively addressing both verbal and nonverbal elements in business negotiations.

[0358] The following describes the processing flow.

[0359] Step 1:

[0360] The device uses the microphone to capture audio data during the business negotiation. The acquired audio data is temporarily stored as a composite audio file that includes the content of the negotiation and the customer's emotional responses.

[0361] Step 2:

[0362] The server receives audio data sent from the terminal. First, it uses a speech recognition engine to convert the audio data into text data. During this process, preprocessing such as noise reduction and volume normalization is performed to improve recognition accuracy.

[0363] Step 3:

[0364] The server simultaneously activates the emotion engine. It analyzes the tone, speed, and intonation of the voice data to recognize the user's emotional state. The recognized emotion data is assigned labels such as "joy," "anxiety," and "excitement."

[0365] Step 4:

[0366] The server inputs text and sentiment data into a generating AI model, automatically creating meeting minutes that integrate the sales discussion content with the user's emotions. The meeting minutes also include the customer's emotional responses, clearly recording how the sales discussion was influenced by emotions.

[0367] Step 5:

[0368] The server inputs the generated meeting minutes and sentiment data into the customer management system. The text content and sentiment responses of sales meetings are centrally managed, providing extremely useful data for future analysis and strategy.

[0369] Step 6:

[0370] After a business meeting concludes, users enter supplementary notes via their terminal. Examples of such notes include additional information about the meeting, next steps, and any special points to note. This information plays a crucial role in subsequent analysis.

[0371] Step 7:

[0372] The server analyzes the additional notes received from the user and converts them into detailed data using information analysis tools. This detailed data is also updated in the customer management system and stored as part of the sales opportunity information.

[0373] Throughout this process, business negotiations are meticulously recorded based on verbal and nonverbal elements, and provided as a dataset that is useful for making important decisions in sales activities.

[0374] (Example 2)

[0375] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0376] Understanding the content of business negotiations and visualizing the customer's psychological state are crucial in many business processes. However, traditional methods only involve transcribing audio data into text, making it difficult to grasp emotional responses. Furthermore, there is a need for a method that efficiently incorporates supplementary information after business negotiations into management systems.

[0377] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0378] In this invention, the server includes an acoustic recognition means, a psychological recognition means, and a record generation means. This enables the generation of detailed records of the business negotiation based on acoustic data during the negotiation, real-time understanding of the customer's psychological reactions, and efficient registration of information into the management system.

[0379] A "business negotiation" is the process of discussing the terms and conditions of a business transaction or contract.

[0380] "Audio data" refers to the digital or analog representation of sound signals generated during a business negotiation.

[0381] "Acoustic recognition means" refers to technologies and devices that analyze acoustic data and convert it into text data.

[0382] "Psychological recognition means" refers to technologies and devices that analyze and identify a speaker's emotions and psychological state from acoustic data.

[0383] "Text data" refers to audio data that represents the content of the audio in text format.

[0384] "Record generation means" refers to technologies and devices that generate detailed conversation records of business negotiations based on text data and psychological data.

[0385] A "management system" is a database or software that systematically organizes data related to business deals, making it easy to access and update.

[0386] "Information input means" refers to technologies and devices for registering generated data into a management system.

[0387] "Data analysis means" refers to technologies and devices that analyze information entered after a business negotiation and convert it into more detailed information.

[0388] "Information update means" refers to technologies and devices that reflect analyzed information in a management system and maintain the data in an up-to-date state.

[0389] This invention is a system that efficiently acquires acoustic data during business negotiations and analyzes and records the content of the negotiations and the customer's psychological state. This system technically processes important data to deeply understand the content of the negotiations and support sales activities.

[0390] The device uses a high-sensitivity microphone to acquire acoustic data during business negotiations. This acoustic data can capture not only the content of what the speaker says during the negotiation, but also their emotional reactions. The acquired acoustic data is transmitted to the server in real time.

[0391] The server performs acoustic recognition based on the received acoustic data and converts it into text data. Acoustic recognition software is used for this, and the accuracy of the conversion is improved by performing noise reduction and volume adjustment as preprocessing. In parallel, the server uses psychocognition technology to analyze and identify the speaker's emotions received from the acoustic data. Psychocognition software has the ability to analyze the tone, speed, and characteristics of the voice and recognize the speaker's psychological state.

[0392] Subsequently, the server uses a generative AI model to generate a conversation record based on text data and psychological data. This conversation record includes the details of the business negotiation and the customer's emotional response, making it a valuable source of information for understanding the psychological dynamics of the business.

[0393] As a concrete example, the prompt message for the generating AI model takes the form of "Based on the acoustic data, generate a detailed record of the business negotiation, including the speaker's emotional responses." The generated conversation record is registered in the management system via the server. This records it as a history of the business negotiation, providing data useful for future analysis and strategic planning of sales activities.

[0394] After a business negotiation concludes, users use their terminals to input additional information related to the negotiation and their next action plan. This information is then transformed into detailed data through data analysis functions and reflected in the management system, where it is utilized as an integrated database.

[0395] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0396] Step 1:

[0397] The device acquires acoustic data during business negotiations using a high-sensitivity microphone. The input is the speech of the negotiation participants, and the output is digital acoustic data. The device is designed to collect this digital acoustic data in real time during the negotiation.

[0398] Step 2:

[0399] The terminal compresses the acquired acoustic data and sends it to the server in a secure manner. The input is the acoustic data obtained in step 1, and the output is the compressed data ready for transfer. This process employs an efficient transfer protocol to minimize data delay.

[0400] Step 3:

[0401] The server receives the audio data and begins processing for sound recognition. The input is audio data sent from the terminal, and the output is text data. The server first performs noise reduction and volume adjustment, and then the speech recognition engine converts the data into text format.

[0402] Step 4:

[0403] The server uses psychocognition technology to evaluate the speaker's psychological state from acoustic data. The input is acoustic data, and the output is data about the speaker's psychological state. The server analyzes the tone, speed, and other vocal characteristics of the voice, and applies emotion recognition algorithms to identify the speaker's psychological state.

[0404] Step 5:

[0405] The server utilizes a generative AI model to generate a conversation record from text data and psychological state data. The input is the data obtained in steps 3 and 4, and the output is a detailed conversation record. The generative AI model is prompted with the message, "Generate a detailed record of the business negotiation based on the acoustic data, including the speaker's emotional responses," to generate the conversation record.

[0406] Step 6:

[0407] The server registers the generated conversation records in the management system. The input is the conversation records obtained in step 5, and the output is the historical information registered in the management system. Through this process, the details of the business negotiations are recorded in an integrated database, which can be used for future analysis and strategic planning.

[0408] Step 7:

[0409] After a business negotiation is completed, the user enters additional information related to the negotiation and their next action plan via a terminal. The input is manual by the user, while the output is updated information based on detailed data analysis. The terminal uploads the input data to the system, converts it into detailed data using the analysis function, and registers it in the management system.

[0410] (Application Example 2)

[0411] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0412] Conventional sales negotiation support systems simply convert speech to text, failing to consider the emotional responses of the participants. This makes it difficult to understand their emotional state and psychological state. Furthermore, the inability to provide emotion-based interactions hinders improvements in the user experience.

[0413] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0414] In this invention, the server includes a speech recognition means for acquiring audio data during a business negotiation and converting the audio data into text data, an emotion recognition means for deriving emotion data from the audio data, and a memo generation means for automatically generating meeting minutes from the text data and emotion data using a generative model. This makes it possible to generate meeting minutes that reflect not only the content of the business negotiation but also the emotional trends of the participants in real time, and to provide user-based interaction.

[0415] "Voice recognition means" refers to technology for converting voice data acquired during business negotiations into text data.

[0416] "Emotion recognition means" refers to a technology that analyzes the characteristics of a voice from acquired audio data and derives emotion data.

[0417] "Memory generation method" refers to a technology that uses a generative model to combine text data and sentiment data to automatically generate meeting minutes.

[0418] "Data entry means" refers to technology for inputting automatically generated meeting minutes into a customer management system.

[0419] "Information analysis means" refers to technology for analyzing simple notes entered after a business negotiation and converting them into detailed data.

[0420] "Data update means" refers to the technology for inputting the analyzed detailed data into the customer management system.

[0421] "Interaction generation means" refers to technologies for generating appropriate suggestions and responses based on the user's emotions.

[0422] In order to carry out this invention, a system having the following configuration is required.

[0423] First, the device continuously acquires participant audio data using the microphone during the business negotiation. This audio data includes both the content of the negotiation and the emotional responses of the participants. The device then transmits this audio data to the server.

[0424] Next, the server applies a speech recognition engine to the received audio data to convert it into text data. During this process, pre-processing such as noise reduction and volume adjustment is performed to improve accuracy. For example, Google Cloud Speech-to-Text can be used as the speech recognition engine.

[0425] Furthermore, the server activates an emotion recognition engine to analyze the tone, speed, and other vocal characteristics of the voice to derive emotion data. Emotion recognition technologies such as the Affectiva SDK can be used in this process.

[0426] The server automatically generates meeting minutes by combining text data and sentiment data acquired using a generative AI model. These minutes include not only the details of the business negotiations that should be entered into the customer management system, but also the emotional responses of the participants.

[0427] Furthermore, the server analyzes the simple notes entered after the business negotiation, converts them into detailed data using information analysis tools, and inputs the results into the customer management system.

[0428] The interaction generation system generates appropriate suggestions based on the user's emotional data, thereby personalizing interactions in business negotiations and conversations and contributing to an improved user experience.

[0429] For example, if the system detects signs of stress from the user's voice during a business negotiation, it can suggest, "Shall we take a short break?" An example of a prompt for generating this interaction would be, "Identify the emotions the user is feeling and create a suggestion accordingly."

[0430] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0431] Step 1:

[0432] The terminal uses a microphone to capture the voices of participants during a business negotiation. The input is audio data of the negotiation. This data is then sent directly to the server.

[0433] Step 2:

[0434] The server applies a speech recognition engine to the received audio data. Specifically, after noise reduction and volume adjustment, it converts the audio into text data. The output of this process is text data that reflects the content of the business negotiation.

[0435] Step 3:

[0436] The server activates an emotion recognition engine to extract emotion data from the audio data. It analyzes the tone, speed, and vocal characteristics of the input audio to detect the user's emotional state (e.g., joy, surprise, anxiety). The output is emotion data.

[0437] Step 4:

[0438] The server uses a generative AI model to integrate text and sentiment data and automatically generate meeting minutes for business negotiations. The input data includes both text and sentiment, and the output is in the form of meeting minutes.

[0439] Step 5:

[0440] The server utilizes user sentiment data to generate appropriate suggestions using interaction generation tools. The input in this process is sentiment data, and the output is interaction suggestions for the user.

[0441] Step 6:

[0442] The user receives interactions generated by the system and provides responses and feedback as needed. Here, the input is the interaction suggestion, and the output is the user's feedback.

[0443] Step 7:

[0444] The server receives a brief memo entered by the user after a business negotiation. Using information analysis tools, it processes this memo into detailed data. The input is a brief memo, and the output is refined, detailed business data.

[0445] Step 8:

[0446] The server inputs detailed data into the customer management device and updates the system-wide database. In this step, the updated data is output.

[0447] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0448] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0449] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0450] [Third Embodiment]

[0451] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0452] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0453] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0454] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0455] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0456] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0457] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0458] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0459] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0460] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0461] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0462] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0463] This invention relates to a system that converts audio data acquired during business negotiations into text in real time, automatically creates meeting minutes using a generative model, and records them in a customer management system. This system also includes inputting simple notes after the negotiation and analyzing them into detailed data to reflect in the system.

[0464] The device uses the microphone to acquire audio data during business negotiations. This audio data is processed as a digital audio file containing statements and conversations made during the negotiation.

[0465] The server receives this audio data and activates the speech recognition engine to convert it into text data. This process is performed in real time, and pre-processing such as noise reduction and volume optimization is carried out.

[0466] This text data is automatically organized and summarized as meeting minutes by a generation AI model on the server. The generated meeting minutes are formatted to be easy to understand and review by all parties involved later.

[0467] The server then automatically inputs these meeting minutes into the customer management system, accurately recording the details of the business negotiation within the system. This makes it easier to later understand the details of the negotiation and plan the next steps.

[0468] After a business meeting concludes, users enter short notes or keywords on their device. This information is used to record new discoveries, important points, and tasks for the next meeting.

[0469] The entered memos are then analyzed in detail by the server's information analysis system, refining the relevant information and registering it in the customer management system. This ensures that the accuracy and reliability of the information remain high even after the business negotiation, and that all information is managed without omission.

[0470] For example, if a user enters a memo such as "The next meeting is at the end of September. The customer has shown interest in product C," the server interprets this as detailed data such as "Next meeting scheduled for the end of September" and "Interested in product C," and registers it in the system.

[0471] This invention dramatically improves the efficiency of data entry work in the sales department and enhances the quality of customer information management by highly automating the recording of business negotiations.

[0472] The following describes the processing flow.

[0473] Step 1:

[0474] The device captures audio data via the microphone during business negotiations. The audio data is temporarily stored in a buffer in digital format and sent to the server in real time.

[0475] Step 2:

[0476] The server receives audio data from the terminal and starts the speech recognition engine. It then performs preprocessing on the received audio data, such as noise reduction and volume normalization, to improve recognition accuracy.

[0477] Step 3:

[0478] The server uses a speech recognition engine to convert pre-processed audio data into text data. This text data contains the content of the business negotiation conversation as written information.

[0479] Step 4:

[0480] The server uses a generation AI model to automatically generate meeting minutes based on the acquired text data. These generated meeting minutes are formatted to summarize the business discussion and clearly express the key points.

[0481] Step 5:

[0482] The server instantly inputs the automatically generated meeting minutes into the customer management system. It connects to the customer management system and creates and records new data entries.

[0483] Step 6:

[0484] After a business meeting concludes, users enter brief notes or keywords related to the meeting via their device. This includes important information such as the meeting's outcome and the next steps.

[0485] Step 7:

[0486] The server receives notes sent by users and analyzes them into detailed data using information analysis tools. This analysis is performed using techniques such as keyword extraction and contextual understanding.

[0487] Step 8:

[0488] The server inputs and updates the detailed data obtained through analysis into the customer management system. This ensures that information after a business negotiation is accurately managed and utilized in subsequent sales activities.

[0489] (Example 1)

[0490] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0491] In today's business environment, accurately recording the details of business negotiations and using that information to improve subsequent customer interactions is crucial. However, manually taking notes on the vast amount of information gathered during negotiations and entering it into a customer information management system is extremely time-consuming and labor-intensive. Furthermore, information is easily omitted or inaccurate, making efficient and accurate negotiation follow-up difficult.

[0492] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0493] In this invention, the server includes an audio processing means for acquiring audio information during negotiations and converting the audio information into text information, an information generation means for automatically generating meeting minutes from the text information using a specified generation format, and a data registration means for automatically inputting the meeting minutes into a customer information management system. This makes it possible to record information during business negotiations in real time.

[0494] "Audio information during negotiations" refers to audio data that records the content of statements and conversations made during business negotiations or discussions.

[0495] "Speech processing means" refers to a device or program that has a series of functions for acquiring, analyzing, and converting speech information in digital format.

[0496] "Text information" refers to digital data expressed as a string of characters after audio information has been converted.

[0497] A "generative format" refers to a model or algorithm that uses artificial intelligence to automatically generate information in a specific format.

[0498] "Information generation means" refers to devices or programs that use a generation format to summarize or generate necessary information from text information.

[0499] "Meeting minutes" refers to a document that summarizes the content of a meeting or business negotiation, and is a record used to review the content later.

[0500] A "customer information management system" is an information system that manages customer data and transaction history, and uses this information to build relationships with customers and improve business operations.

[0501] A "data registration means" refers to a device or program that has the function of automatically inputting and saving generated information into a specific database or system.

[0502] "Information analysis means" refers to devices or programs that analyze input information and extract or refine necessary information.

[0503] This invention aims to efficiently manage and utilize audio information during business negotiations. This system provides advanced technology for automatically converting audio information acquired during negotiations into text information and further generating it as meeting minutes.

[0504] The terminal collects audio information using its microphone during business negotiations. This terminal is used to transmit the content of the negotiations and conversations, sending the acquired audio data to a server. Ideally, the terminal should be equipped with a high-quality microphone and an audio collection application.

[0505] The server converts the received audio information into text using existing technologies such as the Google Speech-to-Text API or IBM Watson Speech to Text as its speech recognition engine. This process includes noise reduction and volume adjustment. Furthermore, the server inputs this text information into a generation AI model (e.g., OpenAI GPT-3) and automatically generates meeting minutes using prompts. This generation process enables accurate tracking of the negotiation's progress and outputs the results in a user-friendly format for stakeholders. An example of a prompt is, "Generate a summary of the negotiation and list the key points."

[0506] The generated meeting minutes are automatically entered into the customer information management system by the server. This ensures that the details of the business negotiations are recorded immediately and can be easily referenced when taking subsequent actions.

[0507] After a business meeting concludes, users use their devices to input brief notes about any new observations or next steps. This information is sent to a server for later use and analyzed by data analysis tools. Users can use a dedicated application on their devices to input notes and keywords.

[0508] The analyzed information is registered in the customer information management system by the server, and refined data representing the results of the business negotiations is stored there. In this way, the system plays a role in supporting the efficiency and accuracy of business negotiation activities.

[0509] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0510] Step 1:

[0511] The device acquires audio information using a microphone during business negotiations. Specifically, the device captures audio data in real time through a built-in or connected microphone and saves it as a digital audio file. The input to this process is the raw audio of what is said and talked during the negotiation, and the output is the saved digital audio data.

[0512] Step 2:

[0513] The terminal transmits the acquired audio information to the server via a secure connection. The terminal encrypts the audio data and uploads it to the server via the internet. The input is the stored audio data, and the output is the audio file uploaded to the server.

[0514] Step 3:

[0515] The server passes the received audio data to the speech recognition engine, which converts the audio into text. Specifically, the server performs pre-processing such as noise reduction and volume adjustment, and then uses speech recognition software to convert it into text. The input to this process is audio data, and the output is the converted text information.

[0516] Step 4:

[0517] The server inputs text information into an AI model to generate meeting minutes. Here, the AI model is instructed using the prompt "Generate a summary of the business meeting and list the key points." The input consists of text information and the prompt, and the output is a summarized meeting minute.

[0518] Step 5:

[0519] The server automatically registers the generated meeting minutes into the customer information management system. Specifically, the server converts the meeting minutes into a predetermined data format and automatically transfers them to the database. The input is the generated meeting minutes, and the output is the data registered in the customer information management system.

[0520] Step 6:

[0521] After the business meeting concludes, the user enters brief notes or keywords using a terminal. The user uses a designated application to input text, which is then sent from the terminal to the server. The input is manually entered text by the user, and the output is the memo data sent to the server.

[0522] Step 7:

[0523] The server analyzes notes sent by users using information analysis tools and generates detailed data. Specifically, it uses natural language processing techniques to analyze text and extract and refine relevant information. The input is the submitted note data, and the output is the detailed data after analysis.

[0524] Step 8:

[0525] The server registers the analyzed detailed data into the customer information management system. This ensures that detailed and accurate information obtained from business negotiations is added to the database. The input is the analyzed detailed data, and the output is the updated data within the system.

[0526] (Application Example 1)

[0527] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0528] During business negotiations, it is crucial for sales representatives to accurately record conversations with customers so that they can easily access the information later. However, depending on the speed and complexity of the conversation, recording can be time-consuming and carries the risk of memory lapses or omissions. Furthermore, manually gathering and organizing large amounts of information is often burdensome. In addition, there is the challenge of not being able to easily access important information when it is needed immediately during a negotiation.

[0529] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0530] In this invention, the server includes a speech recognition device that acquires voice data and converts the voice data into text information; an information generation device that automatically generates summary information from the text information using a generation model; a data recording device that automatically inputs the summary information into a service management system; an information processing device that analyzes simple information entered after a business negotiation and converts it into detailed information; and a display device that uses smart glasses to display important information in real time during a business negotiation. This automates information processing during and after business negotiations, making it possible to manage the content of business negotiations accurately and quickly.

[0531] "Audio data" refers to information recorded in digital format from human speech acquired during business negotiations.

[0532] "Textual information" refers to information in text format that has been converted from audio data by a speech recognition device.

[0533] A "speech recognition device" is a device or system that has the function of analyzing speech data and converting its content into text information.

[0534] A "generative model" is a machine learning model that learns from vast amounts of data and generates summary information and other data based on the input information.

[0535] "Summary information" is a concise content record that is automatically generated by extracting textual information using a generative model.

[0536] An "information generation device" is a device or system that generates summarized information from textual information using a generation model.

[0537] A "data recording device" is a device or system that has the function of automatically inputting summary information into a service management system.

[0538] An "information processing device" is a device or system that analyzes simple information entered after a business negotiation and converts it into detailed information.

[0539] A "service management system" is a digital platform for centrally managing information related to customers and business deals.

[0540] "Smart glasses" are wearable devices that, when worn by a user, display digital information in their field of vision in real time.

[0541] A "display device" is a device or system that uses smart glasses to visually present important information in real time during business negotiations.

[0542] The system for carrying out this invention mainly consists of an integrated configuration of a speech recognition device, an information generation device, a data recording device, an information processing device, and a display device. The server acquires input via smart glasses worn by the user to collect voice data during business negotiations. These glasses are equipped with a highly sensitive microphone that records conversations in real time. The recorded voice data is first converted into text information by the speech recognition device. In this process, pre-processing is performed to remove noise and adjust the volume, so that accurate text information is generated.

[0543] Once text information is generated, the server's information generation device uses a generation AI model to automatically generate summary information from this text. This summary information concisely summarizes the main points of the business negotiation and is recorded in the service management system via a data recording device. After the negotiation, the user can add simple notes by voice using smart glasses. This voice information is analyzed by the information processing device and converted into more detailed information. This detailed information is also updated and recorded in the service management system.

[0544] For example, if a user says during a business meeting, "The customer showed interest in new product A," the speech recognition device converts this into text, and the information generation device generates a summary such as "The customer showed interest in new product A." If the user adds a note after the meeting, such as "I would like the next meeting to be next Tuesday," the information processing device converts this information into detailed information, such as "Next meeting scheduled for: next Tuesday," and records it.

[0545] As an example of a prompt sentence used in a generative AI model, using the sentence, "Please briefly summarize the details of this business opportunity. The customer expressed interest in new product A," will enable a highly accurate summary.

[0546] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0547] Step 1:

[0548] The terminal uses the microphone in smart glasses to acquire audio data during business negotiations. This audio data is saved as a digital audio file recording the conversation between the user and the customer. This audio data serves as input to the system.

[0549] Step 2:

[0550] The server activates the speech recognition device and converts the acquired speech data into text information. The speech recognition device then performs preprocessing, including interference cancellation and volume adjustment. This results in the output of text information with less noise. This generated text information becomes the input for the next processing step.

[0551] Step 3:

[0552] The server's information generation device uses a generation AI model to generate summary information from the text information generated in step 2. Here, the prompt is "Generate a summary of the conversation." This text information is the input, and the summary information is the output.

[0553] Step 4:

[0554] The server's data recording device automatically inputs the generated summary information into the service management system. This process saves the summary information as a sales opportunity record. Here, data is registered in the service management system, and the registered information is then passed on to the next process.

[0555] Step 5:

[0556] After a business meeting, the user inputs a simple memo via voice using smart glasses. This voice input becomes a new input for the information processing device. This memo includes next actions and findings.

[0557] Step 6:

[0558] The server's information processing unit analyzes user input notes and converts them into more detailed information. This generated detailed information is then saved to the service management system.

[0559] Step 7:

[0560] The device displays important information in real time within the user's field of view through smart glasses during business negotiations. This allows users to instantly check the information they need during the negotiation. Real-time information display assists the negotiation and strengthens the connection to the next action.

[0561] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0562] This invention relates to a system that acquires audio data during business negotiations and generates text data and emotion data using speech recognition and emotion recognition technologies. This enables the automatic generation of detailed meeting minutes that include not only the content of the negotiation but also the emotional reactions during the negotiation, and further input into a customer management system to enable deeper analysis of the negotiation.

[0563] The device uses a microphone to acquire audio data during business negotiations. This audio data includes not only the content of what was said during the negotiations but also the customer's emotional reactions.

[0564] The server receives the audio data sent from the terminal. It starts the speech recognition engine and converts the audio data into text data. During this process, pre-processing such as noise reduction and volume adjustment is performed to improve recognition accuracy.

[0565] The server simultaneously uses an emotion engine to recognize the user's emotions from the voice data. The emotion engine analyzes the tone, speed, and other vocal characteristics of the voice to detect the user's emotional state.

[0566] These text and sentiment data are automatically generated as meeting minutes by a generative AI model. The meeting minutes include text information about the business negotiation, as well as the customer's emotional reactions, and are presented in a way that allows for an understanding of the psychological dynamics during the negotiation.

[0567] The server automatically inputs meeting minutes and sentiment data into the customer management system. This data is recorded as an evaluation metric for each sales negotiation and used for post-negotiation analysis and strategic planning for future negotiations.

[0568] After a business meeting concludes, the user enters supplementary information and notes regarding the next action plan via their terminal. These notes are then converted into detailed data using an information analysis tool and registered in the system.

[0569] For example, if a user inputs "The customer was very positive about product D, but expressed concerns about the price," the server will consider this and register the information that the customer is positive about the product but cautious about the price in the customer management system.

[0570] This invention aims to improve the efficiency of sales activities and deepen customer understanding by comprehensively addressing both verbal and nonverbal elements in business negotiations.

[0571] The following describes the processing flow.

[0572] Step 1:

[0573] The device uses the microphone to capture audio data during the business negotiation. The acquired audio data is temporarily stored as a composite audio file that includes the content of the negotiation and the customer's emotional responses.

[0574] Step 2:

[0575] The server receives audio data sent from the terminal. First, it uses a speech recognition engine to convert the audio data into text data. During this process, preprocessing such as noise reduction and volume normalization is performed to improve recognition accuracy.

[0576] Step 3:

[0577] The server simultaneously activates the emotion engine. It analyzes the tone, speed, and intonation of the voice data to recognize the user's emotional state. The recognized emotion data is assigned labels such as "joy," "anxiety," and "excitement."

[0578] Step 4:

[0579] The server inputs text and sentiment data into a generating AI model, automatically creating meeting minutes that integrate the sales discussion content with the user's emotions. The meeting minutes also include the customer's emotional responses, clearly recording how the sales discussion was influenced by emotions.

[0580] Step 5:

[0581] The server inputs the generated meeting minutes and sentiment data into the customer management system. The text content and sentiment responses of sales meetings are centrally managed, providing extremely useful data for future analysis and strategy.

[0582] Step 6:

[0583] After a business meeting concludes, users enter supplementary notes via their terminal. Examples of such notes include additional information about the meeting, next steps, and any special points to note. This information plays a crucial role in subsequent analysis.

[0584] Step 7:

[0585] The server analyzes the additional notes received from the user and converts them into detailed data using information analysis tools. This detailed data is also updated in the customer management system and stored as part of the sales opportunity information.

[0586] Throughout this process, business negotiations are meticulously recorded based on verbal and nonverbal elements, and provided as a dataset that is useful for making important decisions in sales activities.

[0587] (Example 2)

[0588] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0589] Understanding the content of business negotiations and visualizing the customer's psychological state are crucial in many business processes. However, traditional methods only involve transcribing audio data into text, making it difficult to grasp emotional responses. Furthermore, there is a need for a method that efficiently incorporates supplementary information after business negotiations into management systems.

[0590] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0591] In this invention, the server includes an acoustic recognition means, a psychological recognition means, and a record generation means. This enables the generation of detailed records of the business negotiation based on acoustic data during the negotiation, real-time understanding of the customer's psychological reactions, and efficient registration of information into the management system.

[0592] A "business negotiation" is the process of discussing the terms and conditions of a business transaction or contract.

[0593] "Audio data" refers to the digital or analog representation of sound signals generated during a business negotiation.

[0594] "Acoustic recognition means" refers to technologies and devices that analyze acoustic data and convert it into text data.

[0595] "Psychological recognition means" refers to technologies and devices that analyze and identify a speaker's emotions and psychological state from acoustic data.

[0596] "Text data" refers to audio data that represents the content of the audio in text format.

[0597] "Record generation means" refers to technologies and devices that generate detailed conversation records of business negotiations based on text data and psychological data.

[0598] A "management system" is a database or software that systematically organizes data related to business deals, making it easy to access and update.

[0599] "Information input means" refers to technologies and devices for registering generated data into a management system.

[0600] "Data analysis means" refers to technologies and devices that analyze information entered after a business negotiation and convert it into more detailed information.

[0601] "Information update means" refers to technologies and devices that reflect analyzed information in a management system and maintain the data in an up-to-date state.

[0602] This invention is a system that efficiently acquires acoustic data during business negotiations and analyzes and records the content of the negotiations and the customer's psychological state. This system technically processes important data to deeply understand the content of the negotiations and support sales activities.

[0603] The device uses a high-sensitivity microphone to acquire acoustic data during business negotiations. This acoustic data can capture not only the content of what the speaker says during the negotiation, but also their emotional reactions. The acquired acoustic data is transmitted to the server in real time.

[0604] The server performs acoustic recognition based on the received acoustic data and converts it into text data. Acoustic recognition software is used for this, and the accuracy of the conversion is improved by performing noise reduction and volume adjustment as preprocessing. In parallel, the server uses psychocognition technology to analyze and identify the speaker's emotions received from the acoustic data. Psychocognition software has the ability to analyze the tone, speed, and characteristics of the voice and recognize the speaker's psychological state.

[0605] Subsequently, the server uses a generative AI model to generate a conversation record based on text data and psychological data. This conversation record includes the details of the business negotiation and the customer's emotional response, making it a valuable source of information for understanding the psychological dynamics of the business.

[0606] As a concrete example, the prompt message for the generating AI model takes the form of "Based on the acoustic data, generate a detailed record of the business negotiation, including the speaker's emotional responses." The generated conversation record is registered in the management system via the server. This records it as a history of the business negotiation, providing data useful for future analysis and strategic planning of sales activities.

[0607] After a business negotiation concludes, users use their terminals to input additional information related to the negotiation and their next action plan. This information is then transformed into detailed data through data analysis functions and reflected in the management system, where it is utilized as an integrated database.

[0608] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0609] Step 1:

[0610] The device acquires acoustic data during business negotiations using a high-sensitivity microphone. The input is the speech of the negotiation participants, and the output is digital acoustic data. The device is designed to collect this digital acoustic data in real time during the negotiation.

[0611] Step 2:

[0612] The terminal compresses the acquired acoustic data and sends it to the server in a secure manner. The input is the acoustic data obtained in step 1, and the output is the compressed data ready for transfer. This process employs an efficient transfer protocol to minimize data delay.

[0613] Step 3:

[0614] The server receives the audio data and begins processing for sound recognition. The input is audio data sent from the terminal, and the output is text data. The server first performs noise reduction and volume adjustment, and then the speech recognition engine converts the data into text format.

[0615] Step 4:

[0616] The server uses psychocognition technology to evaluate the speaker's psychological state from acoustic data. The input is acoustic data, and the output is data about the speaker's psychological state. The server analyzes the tone, speed, and other vocal characteristics of the voice, and applies emotion recognition algorithms to identify the speaker's psychological state.

[0617] Step 5:

[0618] The server utilizes a generative AI model to generate a conversation record from text data and psychological state data. The input is the data obtained in steps 3 and 4, and the output is a detailed conversation record. The generative AI model is prompted with the message, "Generate a detailed record of the business negotiation based on the acoustic data, including the speaker's emotional responses," to generate the conversation record.

[0619] Step 6:

[0620] The server registers the generated conversation records in the management system. The input is the conversation records obtained in step 5, and the output is the historical information registered in the management system. Through this process, the details of the business negotiations are recorded in an integrated database, which can be used for future analysis and strategic planning.

[0621] Step 7:

[0622] After a business negotiation is completed, the user enters additional information related to the negotiation and their next action plan via a terminal. The input is manual by the user, while the output is updated information based on detailed data analysis. The terminal uploads the input data to the system, converts it into detailed data using the analysis function, and registers it in the management system.

[0623] (Application Example 2)

[0624] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0625] Conventional sales negotiation support systems simply convert speech to text, failing to consider the emotional responses of the participants. This makes it difficult to understand their emotional state and psychological state. Furthermore, the inability to provide emotion-based interactions hinders improvements in the user experience.

[0626] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0627] In this invention, the server includes a speech recognition means for acquiring audio data during a business negotiation and converting the audio data into text data, an emotion recognition means for deriving emotion data from the audio data, and a memo generation means for automatically generating meeting minutes from the text data and emotion data using a generative model. This makes it possible to generate meeting minutes that reflect not only the content of the business negotiation but also the emotional trends of the participants in real time, and to provide user-based interaction.

[0628] "Voice recognition means" refers to technology for converting voice data acquired during business negotiations into text data.

[0629] "Emotion recognition means" refers to a technology that analyzes the characteristics of a voice from acquired audio data and derives emotion data.

[0630] "Memory generation method" refers to a technology that uses a generative model to combine text data and sentiment data to automatically generate meeting minutes.

[0631] "Data entry means" refers to technology for inputting automatically generated meeting minutes into a customer management system.

[0632] "Information analysis means" refers to technology for analyzing simple notes entered after a business negotiation and converting them into detailed data.

[0633] "Data update means" refers to the technology for inputting the analyzed detailed data into the customer management system.

[0634] "Interaction generation means" refers to technologies for generating appropriate suggestions and responses based on the user's emotions.

[0635] In order to carry out this invention, a system having the following configuration is required.

[0636] First, the device continuously acquires participant audio data using the microphone during the business negotiation. This audio data includes both the content of the negotiation and the emotional responses of the participants. The device then transmits this audio data to the server.

[0637] Next, the server applies a speech recognition engine to the received audio data to convert it into text data. During this process, pre-processing such as noise reduction and volume adjustment is performed to improve accuracy. For example, Google Cloud Speech-to-Text can be used as the speech recognition engine.

[0638] Furthermore, the server activates an emotion recognition engine to analyze the tone, speed, and other vocal characteristics of the voice to derive emotion data. Emotion recognition technologies such as the Affectiva SDK can be used in this process.

[0639] The server automatically generates meeting minutes by combining text data and sentiment data acquired using a generative AI model. These minutes include not only the details of the business negotiations that should be entered into the customer management system, but also the emotional responses of the participants.

[0640] Furthermore, the server analyzes the simple notes entered after the business negotiation, converts them into detailed data using information analysis tools, and inputs the results into the customer management system.

[0641] The interaction generation system generates appropriate suggestions based on the user's emotional data, thereby personalizing interactions in business negotiations and conversations and contributing to an improved user experience.

[0642] For example, if the system detects signs of stress from the user's voice during a business negotiation, it can suggest, "Shall we take a short break?" An example of a prompt for generating this interaction would be, "Identify the emotions the user is feeling and create a suggestion accordingly."

[0643] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0644] Step 1:

[0645] The terminal uses a microphone to capture the voices of participants during a business negotiation. The input is audio data of the negotiation. This data is then sent directly to the server.

[0646] Step 2:

[0647] The server applies a speech recognition engine to the received audio data. Specifically, after noise reduction and volume adjustment, it converts the audio into text data. The output of this process is text data that reflects the content of the business negotiation.

[0648] Step 3:

[0649] The server activates an emotion recognition engine to extract emotion data from the audio data. It analyzes the tone, speed, and vocal characteristics of the input audio to detect the user's emotional state (e.g., joy, surprise, anxiety). The output is emotion data.

[0650] Step 4:

[0651] The server uses a generative AI model to integrate text and sentiment data and automatically generate meeting minutes for business negotiations. The input data includes both text and sentiment, and the output is in the form of meeting minutes.

[0652] Step 5:

[0653] The server utilizes user sentiment data to generate appropriate suggestions using interaction generation tools. The input in this process is sentiment data, and the output is interaction suggestions for the user.

[0654] Step 6:

[0655] The user receives interactions generated by the system and provides responses and feedback as needed. Here, the input is the interaction suggestion, and the output is the user's feedback.

[0656] Step 7:

[0657] The server receives a brief memo entered by the user after a business negotiation. Using information analysis tools, it processes this memo into detailed data. The input is a brief memo, and the output is refined, detailed business data.

[0658] Step 8:

[0659] The server inputs detailed data into the customer management device and updates the system-wide database. In this step, the updated data is output.

[0660] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0661] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0662] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0663] [Fourth Embodiment]

[0664] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0665] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0666] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0667] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0668] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0669] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0670] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0671] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0672] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0673] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0674] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0675] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0676] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0677] This invention relates to a system that converts audio data acquired during business negotiations into text in real time, automatically creates meeting minutes using a generative model, and records them in a customer management system. This system also includes inputting simple notes after the negotiation and analyzing them into detailed data to reflect in the system.

[0678] The device uses the microphone to acquire audio data during business negotiations. This audio data is processed as a digital audio file containing statements and conversations made during the negotiation.

[0679] The server receives this audio data and activates the speech recognition engine to convert it into text data. This process is performed in real time, and pre-processing such as noise reduction and volume optimization is carried out.

[0680] This text data is automatically organized and summarized as meeting minutes by a generation AI model on the server. The generated meeting minutes are formatted to be easy to understand and review by all parties involved later.

[0681] The server then automatically inputs these meeting minutes into the customer management system, accurately recording the details of the business negotiation within the system. This makes it easier to later understand the details of the negotiation and plan the next steps.

[0682] After a business meeting concludes, users enter short notes or keywords on their device. This information is used to record new discoveries, important points, and tasks for the next meeting.

[0683] The entered memos are then analyzed in detail by the server's information analysis system, refining the relevant information and registering it in the customer management system. This ensures that the accuracy and reliability of the information remain high even after the business negotiation, and that all information is managed without omission.

[0684] For example, if a user enters a memo such as "The next meeting is at the end of September. The customer has shown interest in product C," the server interprets this as detailed data such as "Next meeting scheduled for the end of September" and "Interested in product C," and registers it in the system.

[0685] This invention dramatically improves the efficiency of data entry work in the sales department and enhances the quality of customer information management by highly automating the recording of business negotiations.

[0686] The following describes the processing flow.

[0687] Step 1:

[0688] The device captures audio data via the microphone during business negotiations. The audio data is temporarily stored in a buffer in digital format and sent to the server in real time.

[0689] Step 2:

[0690] The server receives audio data from the terminal and starts the speech recognition engine. It then performs preprocessing on the received audio data, such as noise reduction and volume normalization, to improve recognition accuracy.

[0691] Step 3:

[0692] The server uses a speech recognition engine to convert pre-processed audio data into text data. This text data contains the content of the business negotiation conversation as written information.

[0693] Step 4:

[0694] The server uses a generation AI model to automatically generate meeting minutes based on the acquired text data. These generated meeting minutes are formatted to summarize the business discussion and clearly express the key points.

[0695] Step 5:

[0696] The server instantly inputs the automatically generated meeting minutes into the customer management system. It connects to the customer management system and creates and records new data entries.

[0697] Step 6:

[0698] After a business meeting concludes, users enter brief notes or keywords related to the meeting via their device. This includes important information such as the meeting's outcome and the next steps.

[0699] Step 7:

[0700] The server receives notes sent by users and analyzes them into detailed data using information analysis tools. This analysis is performed using techniques such as keyword extraction and contextual understanding.

[0701] Step 8:

[0702] The server inputs and updates the detailed data obtained through analysis into the customer management system. This ensures that information after a business negotiation is accurately managed and utilized in subsequent sales activities.

[0703] (Example 1)

[0704] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0705] In today's business environment, accurately recording the details of business negotiations and using that information to improve subsequent customer interactions is crucial. However, manually taking notes on the vast amount of information gathered during negotiations and entering it into a customer information management system is extremely time-consuming and labor-intensive. Furthermore, information is easily omitted or inaccurate, making efficient and accurate negotiation follow-up difficult.

[0706] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0707] In this invention, the server includes an audio processing means for acquiring audio information during negotiations and converting the audio information into text information, an information generation means for automatically generating meeting minutes from the text information using a specified generation format, and a data registration means for automatically inputting the meeting minutes into a customer information management system. This makes it possible to record information during business negotiations in real time.

[0708] "Audio information during negotiations" refers to audio data that records the content of statements and conversations made during business negotiations or discussions.

[0709] "Speech processing means" refers to a device or program that has a series of functions for acquiring, analyzing, and converting speech information in digital format.

[0710] "Text information" refers to digital data expressed as a string of characters after audio information has been converted.

[0711] A "generative format" refers to a model or algorithm that uses artificial intelligence to automatically generate information in a specific format.

[0712] "Information generation means" refers to devices or programs that use a generation format to summarize or generate necessary information from text information.

[0713] "Meeting minutes" refers to a document that summarizes the content of a meeting or business negotiation, and is a record used to review the content later.

[0714] A "customer information management system" is an information system that manages customer data and transaction history, and uses this information to build relationships with customers and improve business operations.

[0715] A "data registration means" refers to a device or program that has the function of automatically inputting and saving generated information into a specific database or system.

[0716] "Information analysis means" refers to devices or programs that analyze input information and extract or refine necessary information.

[0717] This invention aims to efficiently manage and utilize audio information during business negotiations. This system provides advanced technology for automatically converting audio information acquired during negotiations into text information and further generating it as meeting minutes.

[0718] The terminal collects audio information using its microphone during business negotiations. This terminal is used to transmit the content of the negotiations and conversations, sending the acquired audio data to a server. Ideally, the terminal should be equipped with a high-quality microphone and an audio collection application.

[0719] The server converts the received audio information into text using existing technologies such as the Google Speech-to-Text API or IBM Watson Speech to Text as its speech recognition engine. This process includes noise reduction and volume adjustment. Furthermore, the server inputs this text information into a generation AI model (e.g., OpenAI GPT-3) and automatically generates meeting minutes using prompts. This generation process enables accurate tracking of the negotiation's progress and outputs the results in a user-friendly format for stakeholders. An example of a prompt is, "Generate a summary of the negotiation and list the key points."

[0720] The generated meeting minutes are automatically entered into the customer information management system by the server. This ensures that the details of the business negotiations are recorded immediately and can be easily referenced when taking subsequent actions.

[0721] After a business meeting concludes, users use their devices to input brief notes about any new observations or next steps. This information is sent to a server for later use and analyzed by data analysis tools. Users can use a dedicated application on their devices to input notes and keywords.

[0722] The analyzed information is registered in the customer information management system by the server, and refined data representing the results of the business negotiations is stored there. In this way, the system plays a role in supporting the efficiency and accuracy of business negotiation activities.

[0723] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0724] Step 1:

[0725] The device acquires audio information using a microphone during business negotiations. Specifically, the device captures audio data in real time through a built-in or connected microphone and saves it as a digital audio file. The input to this process is the raw audio of what is said and talked during the negotiation, and the output is the saved digital audio data.

[0726] Step 2:

[0727] The terminal transmits the acquired audio information to the server via a secure connection. The terminal encrypts the audio data and uploads it to the server via the internet. The input is the stored audio data, and the output is the audio file uploaded to the server.

[0728] Step 3:

[0729] The server passes the received audio data to the speech recognition engine, which converts the audio into text. Specifically, the server performs pre-processing such as noise reduction and volume adjustment, and then uses speech recognition software to convert it into text. The input to this process is audio data, and the output is the converted text information.

[0730] Step 4:

[0731] The server inputs text information into an AI model to generate meeting minutes. Here, the AI model is instructed using the prompt "Generate a summary of the business meeting and list the key points." The input consists of text information and the prompt, and the output is a summarized meeting minute.

[0732] Step 5:

[0733] The server automatically registers the generated meeting minutes into the customer information management system. Specifically, the server converts the meeting minutes into a predetermined data format and automatically transfers them to the database. The input is the generated meeting minutes, and the output is the data registered in the customer information management system.

[0734] Step 6:

[0735] After the business meeting concludes, the user enters brief notes or keywords using a terminal. The user uses a designated application to input text, which is then sent from the terminal to the server. The input is manually entered text by the user, and the output is the memo data sent to the server.

[0736] Step 7:

[0737] The server analyzes notes sent by users using information analysis tools and generates detailed data. Specifically, it uses natural language processing techniques to analyze text and extract and refine relevant information. The input is the submitted note data, and the output is the detailed data after analysis.

[0738] Step 8:

[0739] The server registers the analyzed detailed data into the customer information management system. This ensures that detailed and accurate information obtained from business negotiations is added to the database. The input is the analyzed detailed data, and the output is the updated data within the system.

[0740] (Application Example 1)

[0741] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0742] During business negotiations, it is crucial for sales representatives to accurately record conversations with customers so that they can easily access the information later. However, depending on the speed and complexity of the conversation, recording can be time-consuming and carries the risk of memory lapses or omissions. Furthermore, manually gathering and organizing large amounts of information is often burdensome. In addition, there is the challenge of not being able to easily access important information when it is needed immediately during a negotiation.

[0743] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0744] In this invention, the server includes a speech recognition device that acquires voice data and converts the voice data into text information; an information generation device that automatically generates summary information from the text information using a generation model; a data recording device that automatically inputs the summary information into a service management system; an information processing device that analyzes simple information entered after a business negotiation and converts it into detailed information; and a display device that uses smart glasses to display important information in real time during a business negotiation. This automates information processing during and after business negotiations, making it possible to manage the content of business negotiations accurately and quickly.

[0745] "Audio data" refers to information recorded in digital format from human speech acquired during business negotiations.

[0746] "Textual information" refers to information in text format that has been converted from audio data by a speech recognition device.

[0747] A "speech recognition device" is a device or system that has the function of analyzing speech data and converting its content into text information.

[0748] A "generative model" is a machine learning model that learns from vast amounts of data and generates summary information and other data based on the input information.

[0749] "Summary information" is a concise content record that is automatically generated by extracting textual information using a generative model.

[0750] An "information generation device" is a device or system that generates summarized information from textual information using a generation model.

[0751] A "data recording device" is a device or system that has the function of automatically inputting summary information into a service management system.

[0752] An "information processing device" is a device or system that analyzes simple information entered after a business negotiation and converts it into detailed information.

[0753] A "service management system" is a digital platform for centrally managing information related to customers and business deals.

[0754] "Smart glasses" are wearable devices that, when worn by a user, display digital information in their field of vision in real time.

[0755] A "display device" is a device or system that uses smart glasses to visually present important information in real time during business negotiations.

[0756] The system for carrying out this invention mainly consists of an integrated configuration of a speech recognition device, an information generation device, a data recording device, an information processing device, and a display device. The server acquires input via smart glasses worn by the user to collect voice data during business negotiations. These glasses are equipped with a highly sensitive microphone that records conversations in real time. The recorded voice data is first converted into text information by the speech recognition device. In this process, pre-processing is performed to remove noise and adjust the volume, so that accurate text information is generated.

[0757] Once text information is generated, the server's information generation device uses a generation AI model to automatically generate summary information from this text. This summary information concisely summarizes the main points of the business negotiation and is recorded in the service management system via a data recording device. After the negotiation, the user can add simple notes by voice using smart glasses. This voice information is analyzed by the information processing device and converted into more detailed information. This detailed information is also updated and recorded in the service management system.

[0758] For example, if a user says during a business meeting, "The customer showed interest in new product A," the speech recognition device converts this into text, and the information generation device generates a summary such as "The customer showed interest in new product A." If the user adds a note after the meeting, such as "I would like the next meeting to be next Tuesday," the information processing device converts this information into detailed information, such as "Next meeting scheduled for: next Tuesday," and records it.

[0759] As an example of a prompt sentence used in a generative AI model, using the sentence, "Please briefly summarize the details of this business opportunity. The customer expressed interest in new product A," will enable a highly accurate summary.

[0760] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0761] Step 1:

[0762] The terminal uses the microphone in smart glasses to acquire audio data during business negotiations. This audio data is saved as a digital audio file recording the conversation between the user and the customer. This audio data serves as input to the system.

[0763] Step 2:

[0764] The server activates the speech recognition device and converts the acquired speech data into text information. The speech recognition device then performs preprocessing, including interference cancellation and volume adjustment. This results in the output of text information with less noise. This generated text information becomes the input for the next processing step.

[0765] Step 3:

[0766] The server's information generation device uses a generation AI model to generate summary information from the text information generated in step 2. Here, the prompt is "Generate a summary of the conversation." This text information is the input, and the summary information is the output.

[0767] Step 4:

[0768] The server's data recording device automatically inputs the generated summary information into the service management system. This process saves the summary information as a sales opportunity record. Here, data is registered in the service management system, and the registered information is then passed on to the next process.

[0769] Step 5:

[0770] After a business meeting, the user inputs a simple memo via voice using smart glasses. This voice input becomes new input for the information processing device. This memo includes next actions and findings.

[0771] Step 6:

[0772] The server's information processing unit analyzes user input notes and converts them into more detailed information. This generated detailed information is then saved to the service management system.

[0773] Step 7:

[0774] The device displays important information in real time within the user's field of view through smart glasses during business negotiations. This allows users to instantly check the information they need during the negotiation. Real-time information display assists the negotiation and strengthens the connection to the next action.

[0775] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0776] This invention relates to a system that acquires audio data during business negotiations and generates text data and emotion data using speech recognition and emotion recognition technologies. This enables the automatic generation of detailed meeting minutes that include not only the content of the negotiation but also the emotional reactions during the negotiation, and further input into a customer management system to enable deeper analysis of the negotiation.

[0777] The device uses a microphone to acquire audio data during business negotiations. This audio data includes not only the content of what was said during the negotiations but also the customer's emotional reactions.

[0778] The server receives the audio data sent from the terminal. It starts the speech recognition engine and converts the audio data into text data. During this process, pre-processing such as noise reduction and volume adjustment is performed to improve recognition accuracy.

[0779] The server simultaneously uses an emotion engine to recognize the user's emotions from the voice data. The emotion engine analyzes the tone, speed, and other vocal characteristics of the voice to detect the user's emotional state.

[0780] These text and sentiment data are automatically generated as meeting minutes by a generative AI model. The meeting minutes include text information about the business negotiation, as well as the customer's emotional reactions, and are presented in a way that allows for an understanding of the psychological dynamics during the negotiation.

[0781] The server automatically inputs meeting minutes and sentiment data into the customer management system. This data is recorded as an evaluation metric for each sales negotiation and used for post-negotiation analysis and strategic planning for future negotiations.

[0782] After a business meeting concludes, the user enters supplementary information and notes regarding the next action plan via their terminal. These notes are then converted into detailed data using an information analysis tool and registered in the system.

[0783] For example, if a user inputs "The customer was very positive about product D, but expressed concerns about the price," the server will consider this and register the information that the customer is positive about the product but cautious about the price in the customer management system.

[0784] This invention aims to improve the efficiency of sales activities and deepen customer understanding by comprehensively addressing both verbal and nonverbal elements in business negotiations.

[0785] The following describes the processing flow.

[0786] Step 1:

[0787] The device uses the microphone to capture audio data during the business negotiation. The acquired audio data is temporarily stored as a composite audio file that includes the content of the negotiation and the customer's emotional responses.

[0788] Step 2:

[0789] The server receives audio data sent from the terminal. First, it uses a speech recognition engine to convert the audio data into text data. During this process, preprocessing such as noise reduction and volume normalization is performed to improve recognition accuracy.

[0790] Step 3:

[0791] The server simultaneously activates the emotion engine. It analyzes the tone, speed, and intonation of the voice data to recognize the user's emotional state. The recognized emotion data is assigned labels such as "joy," "anxiety," and "excitement."

[0792] Step 4:

[0793] The server inputs text and sentiment data into a generating AI model, automatically creating meeting minutes that integrate the sales discussion content with the user's emotions. The meeting minutes also include the customer's emotional responses, clearly recording how the sales discussion was influenced by emotions.

[0794] Step 5:

[0795] The server inputs the generated meeting minutes and sentiment data into the customer management system. The text content and sentiment responses of sales meetings are centrally managed, providing extremely useful data for future analysis and strategy.

[0796] Step 6:

[0797] After a business meeting concludes, users enter supplementary notes via their terminal. Examples of such notes include additional information about the meeting, next steps, and any special points to note. This information plays a crucial role in subsequent analysis.

[0798] Step 7:

[0799] The server analyzes the additional notes received from the user and converts them into detailed data using information analysis tools. This detailed data is also updated in the customer management system and stored as part of the sales opportunity information.

[0800] Throughout this process, business negotiations are meticulously recorded based on verbal and nonverbal elements, and provided as a dataset that is useful for making important decisions in sales activities.

[0801] (Example 2)

[0802] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0803] Understanding the content of business negotiations and visualizing the customer's psychological state are crucial in many business processes. However, traditional methods only involve transcribing audio data into text, making it difficult to grasp emotional responses. Furthermore, there is a need for a method that efficiently incorporates supplementary information after business negotiations into management systems.

[0804] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0805] In this invention, the server includes an acoustic recognition means, a psychological recognition means, and a record generation means. This enables the generation of detailed records of the business negotiation based on acoustic data during the negotiation, real-time understanding of the customer's psychological reactions, and efficient registration of information into the management system.

[0806] A "business negotiation" is the process of discussing the terms and conditions of a business transaction or contract.

[0807] "Audio data" refers to the digital or analog representation of sound signals generated during a business negotiation.

[0808] "Acoustic recognition means" refers to technologies and devices that analyze acoustic data and convert it into text data.

[0809] "Psychological recognition means" refers to technologies and devices that analyze and identify a speaker's emotions and psychological state from acoustic data.

[0810] "Text data" refers to audio data that represents the content of the audio in text format.

[0811] "Record generation means" refers to technologies and devices that generate detailed conversation records of business negotiations based on text data and psychological data.

[0812] A "management system" is a database or software that systematically organizes data related to business deals, making it easy to access and update.

[0813] "Information input means" refers to technologies and devices for registering generated data into a management system.

[0814] "Data analysis means" refers to technologies and devices that analyze information entered after a business negotiation and convert it into more detailed information.

[0815] "Information update means" refers to technologies and devices that reflect analyzed information in a management system and maintain the data in an up-to-date state.

[0816] This invention is a system that efficiently acquires acoustic data during business negotiations and analyzes and records the content of the negotiations and the customer's psychological state. This system technically processes important data to deeply understand the content of the negotiations and support sales activities.

[0817] The device uses a high-sensitivity microphone to acquire acoustic data during business negotiations. This acoustic data can capture not only the content of what the speaker says during the negotiation, but also their emotional reactions. The acquired acoustic data is transmitted to the server in real time.

[0818] The server performs acoustic recognition based on the received acoustic data and converts it into text data. Acoustic recognition software is used for this, and the accuracy of the conversion is improved by performing noise reduction and volume adjustment as preprocessing. In parallel, the server uses psychocognition technology to analyze and identify the speaker's emotions received from the acoustic data. Psychocognition software has the ability to analyze the tone, speed, and characteristics of the voice and recognize the speaker's psychological state.

[0819] Subsequently, the server uses a generative AI model to generate a conversation record based on text data and psychological data. This conversation record includes the details of the business negotiation and the customer's emotional response, making it a valuable source of information for understanding the psychological dynamics of the business.

[0820] As a concrete example, the prompt message for the generating AI model takes the form of "Based on the acoustic data, generate a detailed record of the business negotiation, including the speaker's emotional responses." The generated conversation record is registered in the management system via the server. This records it as a history of the business negotiation, providing data useful for future analysis and strategic planning of sales activities.

[0821] After a business negotiation concludes, users use their terminals to input additional information related to the negotiation and their next action plan. This information is then transformed into detailed data through data analysis functions and reflected in the management system, where it is utilized as an integrated database.

[0822] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0823] Step 1:

[0824] The device acquires acoustic data during business negotiations using a high-sensitivity microphone. The input is the speech of the negotiation participants, and the output is digital acoustic data. The device is designed to collect this digital acoustic data in real time during the negotiation.

[0825] Step 2:

[0826] The terminal compresses the acquired acoustic data and sends it to the server in a secure manner. The input is the acoustic data obtained in step 1, and the output is the compressed data ready for transfer. This process employs an efficient transfer protocol to minimize data delay.

[0827] Step 3:

[0828] The server receives the audio data and begins processing for sound recognition. The input is audio data sent from the terminal, and the output is text data. The server first performs noise reduction and volume adjustment, and then the speech recognition engine converts the data into text format.

[0829] Step 4:

[0830] The server uses psychocognition technology to evaluate the speaker's psychological state from acoustic data. The input is acoustic data, and the output is data about the speaker's psychological state. The server analyzes the tone, speed, and other vocal characteristics of the voice, and applies emotion recognition algorithms to identify the speaker's psychological state.

[0831] Step 5:

[0832] The server utilizes a generative AI model to generate a conversation record from text data and psychological state data. The input is the data obtained in steps 3 and 4, and the output is a detailed conversation record. The generative AI model is prompted with the message, "Generate a detailed record of the business negotiation based on the acoustic data, including the speaker's emotional responses," to generate the conversation record.

[0833] Step 6:

[0834] The server registers the generated conversation records in the management system. The input is the conversation records obtained in step 5, and the output is the historical information registered in the management system. Through this process, the details of the business negotiations are recorded in an integrated database, which can be used for future analysis and strategic planning.

[0835] Step 7:

[0836] After a business negotiation is completed, the user enters additional information related to the negotiation and their next action plan via a terminal. The input is manual by the user, while the output is updated information based on detailed data analysis. The terminal uploads the input data to the system, converts it into detailed data using the analysis function, and registers it in the management system.

[0837] (Application Example 2)

[0838] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0839] Conventional sales negotiation support systems simply convert speech to text, failing to consider the emotional responses of the participants. This makes it difficult to understand their emotional state and psychological state. Furthermore, the inability to provide emotion-based interactions hinders improvements in the user experience.

[0840] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0841] In this invention, the server includes a speech recognition means for acquiring audio data during a business negotiation and converting the audio data into text data, an emotion recognition means for deriving emotion data from the audio data, and a memo generation means for automatically generating meeting minutes from the text data and emotion data using a generative model. This makes it possible to generate meeting minutes that reflect not only the content of the business negotiation but also the emotional trends of the participants in real time, and to provide user-based interaction.

[0842] "Voice recognition means" refers to technology for converting voice data acquired during business negotiations into text data.

[0843] "Emotion recognition means" refers to a technology that analyzes the characteristics of a voice from acquired audio data and derives emotion data.

[0844] "Memory generation method" refers to a technology that uses a generative model to combine text data and sentiment data to automatically generate meeting minutes.

[0845] "Data entry means" refers to technology for inputting automatically generated meeting minutes into a customer management system.

[0846] "Information analysis means" refers to technology for analyzing simple notes entered after a business negotiation and converting them into detailed data.

[0847] "Data update means" refers to the technology for inputting the analyzed detailed data into the customer management system.

[0848] "Interaction generation means" refers to technologies for generating appropriate suggestions and responses based on the user's emotions.

[0849] In order to carry out this invention, a system having the following configuration is required.

[0850] First, the device continuously acquires participant audio data using the microphone during the business negotiation. This audio data includes both the content of the negotiation and the emotional responses of the participants. The device then transmits this audio data to the server.

[0851] Next, the server applies a speech recognition engine to the received audio data to convert it into text data. During this process, pre-processing such as noise reduction and volume adjustment is performed to improve accuracy. For example, Google Cloud Speech-to-Text can be used as the speech recognition engine.

[0852] Furthermore, the server activates an emotion recognition engine to analyze the tone, speed, and other vocal characteristics of the voice to derive emotion data. Emotion recognition technologies such as the Affectiva SDK can be used in this process.

[0853] The server automatically generates meeting minutes by combining text data and sentiment data acquired using a generative AI model. These minutes include not only the details of the business negotiations that should be entered into the customer management system, but also the emotional responses of the participants.

[0854] Furthermore, the server analyzes the simple notes entered after the business negotiation, converts them into detailed data using information analysis tools, and inputs the results into the customer management system.

[0855] The interaction generation system generates appropriate suggestions based on the user's emotional data, thereby personalizing interactions in business negotiations and conversations and contributing to an improved user experience.

[0856] For example, if the system detects signs of stress from the user's voice during a business negotiation, it can suggest, "Shall we take a short break?" An example of a prompt for generating this interaction would be, "Identify the emotions the user is feeling and create a suggestion accordingly."

[0857] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0858] Step 1:

[0859] The terminal uses a microphone to capture the voices of participants during a business negotiation. The input is audio data of the negotiation. This data is then sent directly to the server.

[0860] Step 2:

[0861] The server applies a speech recognition engine to the received audio data. Specifically, after noise reduction and volume adjustment, it converts the audio into text data. The output of this process is text data that reflects the content of the business negotiation.

[0862] Step 3:

[0863] The server activates an emotion recognition engine to extract emotion data from the audio data. It analyzes the tone, speed, and vocal characteristics of the input audio to detect the user's emotional state (e.g., joy, surprise, anxiety). The output is emotion data.

[0864] Step 4:

[0865] The server uses a generative AI model to integrate text and sentiment data and automatically generate meeting minutes for business negotiations. The input data includes both text and sentiment, and the output is in the form of meeting minutes.

[0866] Step 5:

[0867] The server utilizes user sentiment data to generate appropriate suggestions using interaction generation tools. The input in this process is sentiment data, and the output is interaction suggestions for the user.

[0868] Step 6:

[0869] The user receives interactions generated by the system and provides responses and feedback as needed. Here, the input is the interaction suggestion, and the output is the user's feedback.

[0870] Step 7:

[0871] The server receives a brief memo entered by the user after a business negotiation. Using information analysis tools, it processes this memo into detailed data. The input is a brief memo, and the output is refined, detailed business data.

[0872] Step 8:

[0873] The server inputs detailed data into the customer management device and updates the system-wide database. In this step, the updated data is output.

[0874] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0875] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0876] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0877] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0878] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0879] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0880] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0881] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0882] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0883] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0884] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0885] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0886] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0887] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0888] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0889] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0890] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0891] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0892] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0893] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0894] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted as being incorporated by reference.

[0895] The following is further disclosed regarding the embodiments described above.

[0896] (Claim 1)

[0897] A speech recognition means that acquires audio data during a business negotiation and converts the audio data into text data,

[0898] A memo generation means that automatically generates meeting minutes from the text data using a generative model,

[0899] A data entry means for automatically inputting the aforementioned meeting minutes into a customer management system,

[0900] An information analysis tool that analyzes simple notes entered after a business meeting and converts them into detailed data,

[0901] A data update means for inputting the detailed data into the customer management system,

[0902] A system that includes this.

[0903] (Claim 2)

[0904] The system according to claim 1, wherein the speech recognition means has a pre-processing function for noise reduction and volume adjustment.

[0905] (Claim 3)

[0906] The system according to claim 1, wherein the information analysis means generates detailed data through keyword extraction and contextual understanding.

[0907] "Example 1"

[0908] (Claim 1)

[0909] A voice processing means that acquires audio information during negotiations and converts said audio information into text information,

[0910] An information generation means that automatically generates meeting minutes from the text information using a generation format,

[0911] A data registration means for automatically inputting the aforementioned meeting minutes into a customer information management system,

[0912] An information analysis tool that analyzes simple notes entered after negotiations and converts them into detailed information,

[0913] A data update means for inputting the detailed information into a customer information management system,

[0914] A system that includes this.

[0915] (Claim 2)

[0916] The system according to claim 1, wherein the audio processing means has a pre-processing function that performs noise reduction and volume adjustment.

[0917] (Claim 3)

[0918] The system according to claim 1, wherein the information analysis means generates detailed information through the extraction of important terms and contextual understanding.

[0919] "Application Example 1"

[0920] (Claim 1)

[0921] A speech recognition device that acquires audio data and converts said audio data into text information,

[0922] An information generation device that automatically generates summary information from the text information using a generative model,

[0923] A data recording device that automatically inputs the aforementioned summary information into a service management system,

[0924] An information processing device that analyzes simple information entered after a business negotiation and converts it into detailed information,

[0925] An information update device that inputs the detailed information into the service management system,

[0926] A display device that uses smart glasses to show important information in real time during business negotiations,

[0927] A system that includes this.

[0928] (Claim 2)

[0929] The system according to claim 1, wherein the voice recognition device has a pre-processing function for interference removal and volume adjustment.

[0930] (Claim 3)

[0931] The system according to claim 1, wherein the information processing device generates detailed information through word extraction and contextual understanding.

[0932] "Example 2 of combining an emotion engine"

[0933] (Claim 1)

[0934] A sound recognition means that acquires sound data during a business negotiation and converts said sound data into text data,

[0935] A psychological recognition means for detecting a psychological state from the acoustic data,

[0936] A recording generation means that automatically generates a conversation record from the text data and psychological data using a generative model,

[0937] Information input means for automatically inputting the aforementioned conversation record into a management system,

[0938] A data analysis method that analyzes summary notes entered after a business negotiation and converts them into detailed information,

[0939] Information update means for registering the detailed information in the management system,

[0940] A system that includes this.

[0941] (Claim 2)

[0942] The system according to claim 1, wherein the acoustic recognition means has a pre-processing function for noise reduction and volume adjustment.

[0943] (Claim 3)

[0944] The system according to claim 1, wherein the data analysis means generates detailed information through the extraction of important words and understanding of content.

[0945] "Application example 2 when combining with an emotional engine"

[0946] (Claim 1)

[0947] A speech recognition means that acquires audio data during a business negotiation and converts the audio data into text data,

[0948] An emotion recognition means for deriving emotion data from the audio data,

[0949] A memo generation means that automatically generates meeting minutes from the text data and sentiment data using a generative model,

[0950] A data entry means for automatically inputting the aforementioned meeting minutes into a customer management device,

[0951] An information analysis tool that analyzes simple notes entered after a business meeting and converts them into detailed data,

[0952] A data update means for inputting the detailed data into a customer management device,

[0953] An interaction generation method that generates suggestions based on the user's emotions,

[0954] A system that includes this.

[0955] (Claim 2)

[0956] The system according to claim 1, wherein the speech recognition means and emotion recognition means have pre-processing functions for noise reduction and volume adjustment.

[0957] (Claim 3)

[0958] The system according to claim 1, wherein the information analysis means generates detailed data through keyword extraction and contextual understanding. [Explanation of symbols]

[0959] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A speech recognition means that acquires audio data during a business negotiation and converts the audio data into text data, A memo generation means that automatically generates meeting minutes from the text data using a generative model, A data entry means for automatically inputting the aforementioned meeting minutes into a customer management system, An information analysis tool that analyzes simple notes entered after a business meeting and converts them into detailed data, A data update means for inputting the detailed data into the customer management system, A system that includes this.

2. The system according to claim 1, wherein the speech recognition means has a pre-processing function for noise reduction and volume adjustment.

3. The system according to claim 1, wherein the information analysis means generates detailed data through keyword extraction and contextual understanding.