system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system automates post-meeting tasks by converting audio to text, generating meeting minutes and agendas, and scheduling the next meeting, thereby reducing administrative workload and enhancing efficiency.

JP2026100585APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-09
Publication Date: 2026-06-19

Application Information

Patent Timeline

09 Dec 2024

Application

19 Jun 2026

Publication

JP2026100585A

IPC: H04L65/403; H04N21/222; H04N7/173; G06Q10/109; H04N21/214; H04N21/4786; H04N21/4425; H04N21/6377; H04N21/63; H04N7/15; H04N7/10; G06Q10/10; H04N21/238; H04N21/436; H04N21/4227; H04N21/4147; H04N21/431; H04N21/4223; H04N21/4415; H04N21/2368; H04N21/482; H04N21/6408; H04N21/643; H04N21/233; H04N21/422; H04N21/237; H04N21/6379; H04N21/6402; H04N21/443; G06Q10/00; H04N7/14; H04N21/6334; H04N21/4408; H04N21/414; H04N21/442; H04N21/6375; H04N21/8405; H04N21/633

AI Tagging

Application Domain

Television conference systems Office automation

Technology Topics

Software engineering Data science

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Patch updating method and device, electronic equipment, storage medium and program product
CN122387489ASoftware engineering Parallel processing
The main body of the phone stand
CN310087754SSoftware engineering Mechanical engineering
HDMI interface test fixture
CN224456831Uquick testsignal quality fastSignal qualityHDMI
Shutter clip apparatuses, assemblies and methods
US20260194088A1Software engineering Louver
Antenna module and electronic device
CN122370687AMulti band Software engineering

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 2026100585000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] A means of detecting the end of an online meeting, A method for importing recordings of meetings and converting audio to text, A generation method that analyzes text data to generate meeting minutes, A means of automatically generating the agenda and materials for the next meeting, A method for automatically setting the date of the next meeting based on the schedules of meeting participants, A system that includes means for distributing generated meeting minutes and materials to participants.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0004] , , ,

[0005] , , , ,

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In the modern business environment, online meetings are conducted daily. However, a series of administrative tasks that occur after the meeting, especially the creation of meeting minutes, preparation for the next meeting, and adjustment of participant schedules, etc., require a lot of time and effort, and impose a great burden on the person in charge. As a result, the time that should be allocated to other tasks is reduced, and there is a risk of a decrease in overall work efficiency. The object of the present invention is to solve such problems and provide a means for improving the administrative work after the meeting.

Means for Solving the Problems

[0005] This invention provides a means for detecting the end of an online meeting, capturing recorded data, and converting audio to text. This text data is then analyzed to automatically generate meeting minutes. Furthermore, it includes means for generating a draft agenda and materials for the next meeting and automatically scheduling the next meeting based on participants' schedules. The generated minutes and materials are automatically distributed to relevant parties. Additionally, by including means for detecting important moments within the online meeting and taking screenshots to improve the credibility of the generated materials, and by providing an interface that allows users to edit the generated materials, the invention significantly reduces post-meeting administrative work and improves operational efficiency.

[0006] An "online meeting" is a type of meeting in which multiple participants can join remotely via audio and video over the internet.

[0007] "Recorded data" refers to digital information of audio and video recorded during a meeting.

[0008] "Converting audio to text" refers to the process of analyzing audio data and representing its content as textual information.

[0009] "Meeting minutes" refers to a document that concisely summarizes the discussions and decisions made during a meeting.

[0010] "Generation means" refers to a system or device that has the function of automatically creating a specified format or content based on data.

[0011] An "agenda" is a list distributed in advance of a meeting that outlines the purpose, topics, and items to be discussed.

[0012] "Draft documents" refer to drafts of documents and information that are scheduled to be used at the next meeting.

[0013] "Automatic scheduling based on participants' schedules" refers to the process of automatically determining the date and time of a meeting by adjusting the availability of scheduled participants.

[0014] The "editing interface" refers to the part of a system that provides a user interface or tools for reviewing generated data and making necessary corrections or adjustments.

[0015] A "screenshot" refers to an image that captures the contents of a computer screen as a still image. [Brief explanation of the drawing]

[0016] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11]It is a sequence diagram showing the processing flow of the data processing system in Embodiment 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when combined with an emotion engine. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when combined with an emotion engine.

Mode for Carrying Out the Invention

[0017] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described according to the accompanying drawings.

[0018] First, the terms used in the following description will be explained.

[0019] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0020] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0021] In the following embodiments, the signed storage is one or more non-volatile storage devices that store various programs and various parameters. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0022] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0023] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0024] [First Embodiment]

[0025] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0026] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0027] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0028] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0029] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0030] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0031] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0032] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0033] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0034] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0035] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0036] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0037] Specific embodiments for carrying out the present invention are shown below. This system aims to efficiently manage online meetings and automate post-meeting administrative tasks, thereby reducing the burden on participants and organizers.

[0038] First, the terminal starts running this program upon receiving a termination notification from the online meeting system. Upon receiving the termination notification, the server automatically retrieves the meeting recording data from the designated storage. The retrieved audio data is then converted into text data using speech recognition technology.

[0039] Next, the server analyzes the text data to extract key discussions and decisions from the meeting. This analysis uses natural language processing technology, and based on the extracted data, a generative AI connected via the API automatically generates meeting minutes. Furthermore, it similarly generates a draft agenda and materials for the next meeting.

[0040] The generated meeting minutes and materials are notified to the user from the server, and the user can review and modify the content through the provided editing interface. This interface is designed to facilitate feedback on the generated content.

[0041] The server also references participants' schedule information and automatically sets the date for the next meeting. It presents candidate dates, facilitates coordination among participants as needed, and selects the most suitable date and time. After selection, meeting invitations are automatically sent to each participant.

[0042] As a concrete example, after the monthly meeting ends, this system is activated, and important points are detected from the one-hour recorded meeting data, transcribed, and meeting minutes and a draft agenda for the next meeting are automatically generated within 20 minutes and distributed to relevant parties. Through this process, the administrative process after the meeting is significantly streamlined.

[0043] The following describes the processing flow.

[0044] Step 1:

[0045] The device detects the end of the online meeting and sends a signal to the server to send a notification. This signal allows the program to recognize that the meeting has ended.

[0046] Step 2:

[0047] When the server receives a termination notification from the terminal, it automatically begins retrieving the meeting recording data from the designated storage. This recording data includes both audio and video.

[0048] Step 3:

[0049] The server sends the captured audio data to a speech recognition system, which transcribes it into text data. High-precision speech recognition technology is used in this process.

[0050] Step 4:

[0051] The server analyzes the generated text data through a natural language processing engine and automatically extracts important discussion points and decisions.

[0052] Step 5:

[0053] The server automatically generates meeting minutes using a generative AI connected via an API, based on the extracted data. These minutes include screenshots from the meeting, along with the data extracted through analysis.

[0054] Step 6:

[0055] The server uses a similar generative AI to create a draft agenda and materials for the next meeting. This draft will be tailored to the content of the meeting.

[0056] Step 7:

[0057] The server notifies users of the generated meeting minutes and draft documents, and provides an editing interface for review and modification. This interface allows users to easily review the content and make necessary corrections.

[0058] Step 8:

[0059] The server retrieves meeting participants' schedule information via a calendar API and automatically adjusts the date for the next meeting. After adjustment, it determines the optimal date and time.

[0060] Step 9:

[0061] The server generates meeting invitations based on the coordinated schedule and automatically distributes them to participants. At the same time, it ensures that the meeting date is properly reflected in each participant's calendar.

[0062] (Example 1)

[0063] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0064] Current online meeting systems have several drawbacks, including the significant time and effort required for post-meeting administrative tasks and the difficulty in efficiently extracting important information during meetings. Furthermore, scheduling subsequent meetings must be done manually, adding to the burden on participants. These challenges can reduce the effectiveness of meetings and impair productivity.

[0065] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0066] In this invention, the server includes means for detecting the end of an online meeting, means for acquiring meeting recording information and converting audio to text data, and means for analyzing the text data to generate a meeting record. This makes it possible to generate a meeting record effectively and quickly after the meeting ends and to automatically extract important information. Furthermore, by automatically setting the date of the next meeting based on time management information and providing an editing interface for the generated materials, the burden on participants can be reduced and the productivity of the meeting can be improved.

[0067] An "online meeting" is a virtual meeting conducted via the internet, where participants can communicate with each other remotely through screen and audio.

[0068] "Means for detecting termination" refers to a technical configuration used as a trigger to automatically identify the end of an online meeting and notify the system of the result.

[0069] "Recorded information" refers to audio, video, and other data recorded during online meetings, which are used later as material for analysis and processing.

[0070] "Methods for converting speech to text data" refers to technologies that convert speech signals into text information, and is a process that uses speech recognition technology to output speech content as a string of characters.

[0071] "Methods for generating meeting minutes by analyzing text data" refers to functions and technologies for analyzing text information, organizing important content and decisions discussed at a meeting, and creating meeting minutes.

[0072] "Time management information" refers to information about the schedules and available time of meeting participants, and is used to coordinate the date of the next meeting.

[0073] An "editing interface" refers to the user interface and tools provided to allow users to review and modify generated documents and meeting records.

[0074] This invention is a system that enables efficient management and automated post-meeting processing of online meetings. First, the terminal starts the system when it receives a notification that the online meeting has ended. This notification acts as an automatic trigger from the meeting platform.

[0075] Upon receiving the termination notification, the server immediately retrieves the meeting recording information from the cloud storage service. This step utilizes a common cloud storage service, with Amazon S3 being a particularly suitable option for large-scale data management. The server then uses speech recognition technology, such as Google Cloud Speech-to-Text API, to convert the recorded audio data into text data.

[0076] The converted text data is further analyzed using natural language processing techniques to extract important discussions and decisions from the meeting. This analysis utilizes natural language processing libraries such as NLTK and spaCy. The analyzed information is sent to a generative AI model via an API, which, for example, GPT-3 (registered trademark), is used to automatically generate meeting minutes and plans for future meetings.

[0077] The generated records and materials are notified to the user by the server. The user can review the output through the provided editing interface and make corrections as needed. This interface is designed to allow users to easily provide feedback and make revisions.

[0078] The server also references participants' time management information and automatically schedules the next meeting. This functionality is achieved by retrieving schedule information via the Microsoft® Graph API. Along with the selected meeting date, the server automatically distributes meeting invitations to participants. This process includes automated email sending and the creation of calendar events.

[0079] As a concrete example, this system functions after the monthly meeting, using speech recognition and natural language processing to extract key information from the one-hour meeting recording. Meeting minutes and a draft plan for the next meeting are generated within 20 minutes and quickly distributed to relevant parties. This process significantly shortens the administrative work required after the meeting.

[0080] An example of a prompt for a generative AI model would be: "Create meeting minutes from the following key points: List of key points." This would allow the model to generate appropriate meeting minutes.

[0081] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0082] Step 1:

[0083] The terminal receives a notification that an online meeting has ended. This operates based on an API call from the meeting management platform. Specifically, it receives a webhook notification sent by the platform when the meeting ends, and the system initiates processing based on that notification. The input is the meeting end notification, and the output is the trigger for initiating processing.

[0084] Step 2:

[0085] After receiving a notification, the server retrieves meeting recording information from cloud storage. This involves downloading data using the storage API. For example, it might retrieve audio data using a specified file path from the storage system. The input for this step is the data path in storage, and the output is the retrieved audio data.

[0086] Step 3:

[0087] The server uses the acquired audio data to convert speech into text data using speech recognition technologies such as the Google Cloud Speech-to-Text API. The audio data is sent to the API, which analyzes the audio signal and generates text. The input for this step is audio data, and the output is text data.

[0088] Step 4:

[0089] The server analyzes the generated text data and extracts important discussions and decisions using natural language processing techniques. Natural language processing libraries such as NLTK and spaCy are used to extract key phrases by parsing the text data. The input for this step is text data, and the output is a list of important points.

[0090] Step 5:

[0091] The server uses the extracted key points to send prompts to the generative AI model, which then generates meeting minutes and a plan for the next meeting. Based on these prompts, the generative AI model performs natural language generation to construct appropriate content. The input for this step is a list of key points, and the output is meeting minutes and a meeting plan.

[0092] Step 6:

[0093] The server notifies the user of the generated meeting minutes and meeting plan. Notifications are sent via email or a notification service, and users can review and edit these generated documents through a web-based editing interface. The input for this step is the meeting minutes and meeting plan, and the output is the notification to the user.

[0094] Step 7:

[0095] The server automatically schedules the next meeting based on participants' time management information. It uses the Microsoft Graph API to retrieve participants' calendar information and scales optimally. The input for this step is participants' calendar information, and the output is the date of the next meeting.

[0096] Step 8:

[0097] The server automatically distributes invitations based on the determined meeting schedule. It uses an email system or calendar API to send invitations to participants and automatically creates calendar events. The input for this step is the meeting schedule, and the output is the invitation sent to participants.

[0098] (Application Example 1)

[0099] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0100] Modern information exchange and meetings are frequent and voluminous, often leaving participants overwhelmed with organizing information after meetings and preparing for future meetings. Therefore, there is a need for efficient post-meeting information processing and a reduction in the time and effort required for future preparation. Furthermore, it is often difficult to accurately capture crucial moments during meetings, leading to challenges in creating meeting minutes and planning for future meetings.

[0101] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0102] In this invention, the server includes means for detecting the end of online information exchange, means for acquiring recorded information data and converting sound to text, and means for analyzing the text data and generating a meeting summary. This automates post-meeting information processing, allowing participants to easily understand the meeting content and quickly plan for the next meeting.

[0103] "Online information exchange" refers to meetings and discussions conducted via the internet, where participants share information and engage in discussions without meeting face-to-face.

[0104] "Information recording data" refers to all digital recording information, such as audio, video, and documents, obtained during meetings and information exchanges.

[0105] "Converting sound to text" refers to the process of converting audio data into text data using speech recognition technology.

[0106] "Generation means" refers to technologies and devices used within an information processing system to automatically generate new information based on specific data.

[0107] "Analyzing text data" refers to the process of extracting meaning from text information or identifying important points using natural language processing techniques.

[0108] A "meeting summary" is information that summarizes the content discussed at a meeting and highlights key points, allowing participants to quickly understand the meeting's content.

[0109] "Automatically sending notifications" refers to a system where information or messages are automatically sent to users based on set conditions.

[0110] "Acquiring visual information" refers to the technology of capturing important moments as images or videos, which is used to review key parts of a meeting later.

[0111] The system of this invention is designed to improve the efficiency of online information exchange. Its main components include a terminal for detecting the end of online information exchange, a server for storing recorded information data in cloud storage and making it accessible later, and means for converting audio data into text data and generating automatically generated meeting summaries and materials for planning the next meeting.

[0112] First, the terminal detects that the information exchange has ended and transfers the recorded data to the cloud. The server uses the Google Cloud Speech-to-Text API to convert the audio data into text data. Next, IBM Watson® Natural Language Understanding is used to extract important discussions and decisions from the converted text data. Then, using OpenAI®'s GPT model, a meeting summary and materials for planning the next meeting are automatically generated based on the extracted information. This generated document is delivered to the user via notification, allowing the user to immediately review the content and edit or provide feedback as needed.

[0113] As a concrete example, imagine a scenario where, after a fintech company finishes an online meeting, a new investment strategy discussed during the meeting is detected, a generated memo is automatically created in just 5 minutes, and then delivered to all members via push notification. Operating this system would facilitate rapid information sharing and significantly improve the efficiency of individual post-meeting tasks.

[0114] Examples of input prompts for a generative AI model include the following:

[0115] "This is a text summarizing the key strategies and policies discussed during the information exchange. Please use this to create a meeting summary and generate a draft agenda for the next meeting."

[0116] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0117] Step 1:

[0118] The terminal detects the end of the online information exchange. Upon receiving the termination detection trigger, the terminal automatically prepares to upload the recorded data collected during the information exchange to cloud storage. The input to this step is the trigger for the end of the meeting, and the output is the recorded data stored in the cloud.

[0119] Step 2:

[0120] The server receives the recorded data stored in the cloud. The server uses the Google Cloud Speech-to-Text API to process the audio data and convert the audio data into text data. The input is the recorded data, and the output is the text data.

[0121] Step 3:

[0122] The server analyzes the converted character data using IBM Watson Natural Language Understanding. Here, important discussions and decisions are extracted. The input is character data, and the output is a list of important discussions and decisions. This data processing is performed using natural language processing techniques.

[0123] Step 4:

[0124] The server uses the extracted information to create a meeting summary and planning materials for the next meeting using OpenAI's GPT model. Here, the generative AI model generates a new document based on the input prompt text. The input is the extracted information, and the output is the automatically generated text.

[0125] Step 5:

[0126] The server sends the generated meeting summary and planning materials to the user via push notification. The user can review the content and provide feedback through the received notification. The input is the generated document, and the output is the notification to the user. This operation allows the user to quickly review the content and plan their next actions.

[0127] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0128] One embodiment of the present invention relates to an online meeting management system incorporating an emotion engine. This system comprehensively streamlines the process from initiating a meeting to completing administrative tasks afterward, and utilizes user emotion data.

[0129] First, the device detects the end of the online meeting and sends that information to the server. Upon receiving this notification, the server accesses the emotion engine to analyze the user's speech and facial expression data collected during the meeting. Based on this analysis, it identifies the emotional state and integrates the results with other data.

[0130] Using this emotional information, the server extracts key points from the meeting content, which has been converted into text data using speech recognition technology. In this point extraction process, the server generates meeting minutes that take the emotional data into consideration, highlighting areas where emotional changes are particularly significant, thereby providing participants with useful information.

[0131] Furthermore, emotional data is reflected in the creation of the agenda and draft materials for the next meeting. Based on the user's emotional state, the server suggests a less stressful schedule and agenda items that facilitate dialogue. This helps to improve participants' work efficiency and create a better meeting environment.

[0132] For example, if the emotion engine detects a point where a user's emotions change negatively, the server uses that information to suggest improvements to the agenda and follow-up for the next meeting. This makes it easier for participants' opinions to be reflected, leading to an overall increase in satisfaction.

[0133] Thus, the present invention can improve the quality of online meetings and streamline their operation through the use of data by an emotion engine.

[0134] The following describes the processing flow.

[0135] Step 1:

[0136] The device prepares to send user speech and facial expression data to the emotion engine as soon as the meeting starts. This data transmission takes place in real time, continuously recording the user's emotional state.

[0137] Step 2:

[0138] After detecting the end of an online meeting, the server retrieves sentiment data collected along with the video recording. The video recording includes audio and video, while the sentiment data records information based on the user's facial expressions and tone of voice.

[0139] Step 3:

[0140] The server converts the recorded audio data into text data using a speech recognition system. Simultaneously, an emotion engine analyzes the user's emotional changes and adds the results to the text data as emotion tags.

[0141] Step 4:

[0142] The server analyzes the text data using natural language processing to extract key points of the agenda and sections related to different emotional states. This process incorporates emotion tag information and interprets the data in a way that takes user responses into account.

[0143] Step 5:

[0144] The server automatically generates meeting minutes using a generation AI based on the extracted data. These minutes highlight areas where there were particularly significant emotional shifts, serving as a reference for participants when reviewing the discussion.

[0145] Step 6:

[0146] The server creates the agenda and draft materials for the next meeting, taking emotional data into consideration. Based on the user's emotional state, the content is structured to propose productive topics while reducing stress.

[0147] Step 7:

[0148] The server notifies users of the generated meeting minutes and materials for the next meeting, and provides an interface for reviewing and editing the content. This interface also allows for sentiment-based feedback.

[0149] Step 8:

[0150] The server combines meeting participants' schedules and sentiment data to adjust the date for the next meeting. This automatically sets a schedule that reduces the burden on participants.

[0151] Step 9:

[0152] The server automatically sends invitations to participants based on the decided meeting schedule. These invitations also include emotionally charged notes and preparation requirements.

[0153] (Example 2)

[0154] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0155] Online meetings present challenges in accurately grasping the content and using it for future meetings, as a large amount of information is exchanged simultaneously. Furthermore, it's crucial to appropriately capture participants' emotions to enhance the effectiveness of future meetings. Addressing these challenges requires automated analysis of meeting content and the generation of meeting minutes and agendas that reflect participants' emotional states.

[0156] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0157] In this invention, the server includes means for detecting the end of an online meeting, means for capturing recording data of the meeting and converting audio to text, means for identifying the user's emotional state using an emotion engine, means for analyzing text data and emotion data to generate meeting minutes, and means for automatically generating an agenda and materials for the next meeting, taking the emotion data into consideration. This enables more efficient post-meeting administrative work and more effective meeting management based on the participants' emotions.

[0158] An "online meeting" is a gathering in which participants in multiple locations share audio and video in real time via the internet to communicate.

[0159] A "means for detecting termination" refers to a mechanism that identifies the termination status of an online meeting and notifies the system of that information.

[0160] "Recorded data" refers to digital data that records the audio and video exchanges that took place during an online meeting.

[0161] "Means of converting speech to text" refers to a technology or process for converting recorded speech data into textual information.

[0162] An "emotion engine" is an algorithm or software that analyzes a user's pronunciation and facial expressions to identify their emotional state.

[0163] "Text data" refers to a data format that contains string information obtained by converting speech into text.

[0164] "Methods for generating meeting minutes" refers to a function that automatically creates notes summarizing the meeting content, extracting the important points.

[0165] "Emotional data" refers to digital data that indicates the emotional state of meeting participants, derived from their statements and facial expressions.

[0166] An "agenda" is a plan that lists the items and topics that are scheduled to be discussed at the next meeting.

[0167] "Means for automatically generating documents" refers to technologies or systems that automatically create documents summarizing important information and proposals based on the content of discussions.

[0168] A "schedule" refers to the availability of meeting participants and is used to coordinate the date of the next meeting.

[0169] This invention aims to improve the efficiency and quality of meetings by using an online meeting management system that incorporates an emotion engine. The system is primarily operated through the cooperation of a server, terminals, and users.

[0170] First, the device uses an online meeting application (e.g., a common online meeting platform) to monitor the start and end of the meeting. Once the meeting ends, it sends that information to the server.

[0171] Next, the server starts operating triggered by the termination notification. The server retrieves the audio data of the recorded meeting and uses speech recognition technology (e.g., a common speech-to-text service) to convert the audio data into text data. In addition, the server uses an emotion engine to analyze the user's statements and facial expressions during the meeting to identify the user's emotional state. A common facial expression recognition service is used in this analysis process.

[0172] Text and sentiment data are integrated by the server and analyzed by a meeting minutes generation engine. This engine uses natural language processing techniques to extract key points and areas of significant emotional shifts during the meeting. This automatically generates meeting minutes that provide useful information for participants.

[0173] Furthermore, the server automatically generates the agenda and materials for the next meeting. The server considers emotional data and suggests topics to ensure participants can engage in dialogue without feeling stressed. For example, if a topic is identified with a high level of negative emotion, it will add suggestions for improvement and follow-up to the agenda.

[0174] For example, if the emotion engine detects an increase in negative emotions during a meeting based on user speech data and facial expression data, the server will use this information to generate suggestions for improving the agenda for the next meeting. An example of a prompt for this process would be, "When participants' negative emotions increased during the meeting, please generate a proposal for the next meeting agenda that takes this into account."

[0175] Therefore, by utilizing emotional data, this system can streamline post-meeting processing for online meetings and provide a meeting environment that promotes smooth communication among participants.

[0176] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0177] Step 1:

[0178] The device monitors the start and end of online meetings in real time. When a meeting ends, the device generates termination notification data, including the meeting ID, end time, and a list of participants, and sends it to the server.

[0179] Step 2:

[0180] The server, triggered by the received termination notification, retrieves the corresponding meeting recording data from the meeting database. The server then uses speech recognition technology to convert this audio data into text data. Specifically, the server processes the audio input using a natural language processing algorithm to generate a text-based conversation log.

[0181] Step 3:

[0182] User speech and facial expression data are captured during the meeting via dedicated sensors and cameras and sent to a server. The server uses an emotion engine to analyze this data and identify each user's emotional state. In this analysis process, the content of speech and facial expressions are used as input, and positive, negative, or neutral emotion data is generated as output.

[0183] Step 4:

[0184] The server integrates text data and sentiment data and inputs it into a meeting minutes generation engine. This engine applies an algorithm that extracts important discussion points and areas of significant emotional shifts. The resulting meeting minutes clearly indicate key phrases and emotional fluctuations during the meeting.

[0185] Step 5:

[0186] The server automatically generates the agenda and materials for the next meeting based on the generated sentiment data. For agenda items where sentiment changed significantly, improvement measures and follow-ups are added to the agenda. The previous sentiment data and meeting minutes are used as input, and the proposed agenda is generated as output.

[0187] Step 6:

[0188] The server distributes the generated meeting minutes and agenda to each participant. The distribution process sends the documents to designated email addresses or individual folders in cloud storage. This allows all participants to prepare for the next meeting.

[0189] (Application Example 2)

[0190] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0191] Online communication presents challenges in providing immediate feedback and customized content based on participants' emotions and states. Furthermore, mechanisms for effectively and efficiently recording meeting content and incorporating it into future meetings are often inadequate. To address these challenges, real-time sentiment analysis and content customization are essential.

[0192] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0193] In this invention, the server includes means for taking in recorded data and converting audio information into text information, means for analyzing the emotional state of participants and providing customized content, and means for automatically scheduling the next communication based on the participants' schedules. This enables immediate content tagging based on the emotional state of participants and the automation of optimal planning for the next meeting.

[0194] "Online communication" is a technology that allows multiple participants in remote locations to exchange information and conduct conversations and meetings in real time via the internet.

[0195] "Recorded data" refers to data that stores information such as audio, video, and text generated during online communication.

[0196] "Audio information" refers to the words and sounds uttered by participants during online communication, and is the information that is recorded and analyzed.

[0197] "Textual information" refers to data obtained by converting audio information into text format, and it forms the basis of meeting minutes and reports.

[0198] "Participant emotional state" refers to the results of measuring participants' psychological responses in real time during online communication and analyzing the changes in those responses.

[0199] "Customized content" refers to information and suggestions that are individually optimized based on the participant's emotional state and past data.

[0200] "Content tagging" is a technique that assigns identifiers to specific information or scenes at the peak of participants' emotional states, making them useful for later reference and analysis.

[0201] "Meeting content" refers to the overall information, including statements, discussions, and resolutions exchanged during online communication.

[0202] "Automatic scheduling" refers to a function where the system automatically determines the dates for meetings and communications, taking into account the participants' schedules.

[0203] "The optimal plan for the next meeting" refers to the agenda and materials for the next meeting, which are formulated based on participant sentiment analysis and the content of the previous meeting, with the aim of reducing stress and improving efficiency.

[0204] The embodiment of this invention relates to a content delivery system that utilizes emotion analysis during online communication. Specifically, it enables the provision of individually optimized information by analyzing audio and video data in real time and understanding the emotional state of participants. This system primarily operates through cooperation between a server, a terminal, and a user.

[0205] The server receives recorded data of online communications and uses speech recognition technology to convert speech information into text information. It is recommended to use advanced models such as "OpenAI Whisper" for speech recognition. This allows the communication content to be saved as text data and analyzed.

[0206] Next, the device uses video data to analyze the participants' emotional state in real time. The emotion analysis incorporates "Emotion AI" technology to identify changes in emotion from the video data. This resulting emotional data is then sent to a server and stored as analysis results.

[0207] Users receive customized information provided by the system. This is automatically generated content based on participants' past reactions and key points of discussion. For example, the moments that evoked the most emotional responses from users are tagged, and similar information is recommended later.

[0208] One concrete application of this system is to detect a user's smile while watching an emotionally moving movie and identify the scene in question. Based on the user's preferences, related movies and entertainment are then recommended. An example of a prompt message to input into the generating AI model would be, "Analyze the user's reactions and identify the scene with the strongest response. We would like to use this for future content recommendations."

[0209] In this way, this invention aims to improve the communication experience by utilizing the emotional state of participants in online communications and providing more personalized information.

[0210] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0211] Step 1:

[0212] The device uses a microphone and camera to capture participant audio and video data during online communication. Input is real-time audio and video, while output is digital data for recording. This data is immediately transferred to the server. Specifically, audio is saved in .wav format and video in .mp4 format.

[0213] Step 2:

[0214] The server inputs the received audio data into the speech recognition model "OpenAI Whisper" and converts it into text information. The input is audio data in a .wav file, and the output is the converted text data. Preprocessing such as noise reduction and speaker separation is performed at this stage, which improves the accuracy of the analysis.

[0215] Step 3:

[0216] The device inputs received video data into "Emotion AI," which analyzes the emotional state of participants based on their facial expressions. The input is video data in .mp4 format, and the output is numerical data indicating the emotional category (e.g., joy, interest, surprise) and its intensity. This data is used to track emotional changes in real time.

[0217] Step 4:

[0218] The server integrates the audio text data and sentiment analysis results to perform content tagging. The input is the text data and sentiment data up to this step, and the output is a list of tagged key points. Tagging applies an algorithm that calculates the frequency of verbs and nouns and identifies the moments when emotions peaked.

[0219] Step 5:

[0220] The user receives content generated by the server and recommendations for the next session. The input is tagged points and a list of content recommendations sent from the server, and the output is customized feedback that the user views. For example, the user might receive a customized list that includes trailers for relevant movies.

[0221] Step 6:

[0222] The server uses user feedback and viewing history to input prompts into a generating AI model to optimize future recommendations. The input consists of user feedback data and pre-prepared prompts, while the output is updated information for the next recommendation items. An example of a prompt is, "Analyze the user's reactions and identify the scenes that elicited the strongest responses. We would like to use this information for future content recommendations."

[0223] Through this process, the system can provide personalized information based on participants' emotions, thereby improving the online communication experience.

[0224] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0225] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0226] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0227] [Second Embodiment]

[0228] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0229] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0230] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0231] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0232] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0233] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0234] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0235] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0236] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0237] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0238] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0239] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0240] Specific embodiments for carrying out the present invention are shown below. This system aims to efficiently manage online meetings and automate post-meeting administrative tasks, thereby reducing the burden on participants and organizers.

[0241] First, the terminal starts running this program upon receiving a termination notification from the online meeting system. Upon receiving the termination notification, the server automatically retrieves the meeting recording data from the designated storage. The retrieved audio data is then converted into text data using speech recognition technology.

[0242] Next, the server analyzes the text data to extract key discussions and decisions from the meeting. This analysis uses natural language processing technology, and based on the extracted data, a generative AI connected via the API automatically generates meeting minutes. Furthermore, it similarly generates a draft agenda and materials for the next meeting.

[0243] The generated meeting minutes and materials are notified to the user from the server, and the user can review and modify the content through the provided editing interface. This interface is designed to facilitate feedback on the generated content.

[0244] The server also references participants' schedule information and automatically sets the date for the next meeting. It presents candidate dates, facilitates coordination among participants as needed, and selects the most suitable date and time. After selection, meeting invitations are automatically sent to each participant.

[0245] As a concrete example, after the monthly meeting ends, this system is activated, and important points are detected from the one-hour recorded meeting data, transcribed, and meeting minutes and a draft agenda for the next meeting are automatically generated within 20 minutes and distributed to relevant parties. Through this process, the administrative process after the meeting is significantly streamlined.

[0246] The following describes the processing flow.

[0247] Step 1:

[0248] The device detects the end of the online meeting and sends a signal to the server to send a notification. This signal allows the program to recognize that the meeting has ended.

[0249] Step 2:

[0250] When the server receives a termination notification from the terminal, it automatically begins retrieving the meeting recording data from the designated storage. This recording data includes both audio and video.

[0251] Step 3:

[0252] The server sends the captured audio data to a speech recognition system, which transcribes it into text data. High-precision speech recognition technology is used in this process.

[0253] Step 4:

[0254] The server analyzes the generated text data through a natural language processing engine and automatically extracts important discussion points and decisions.

[0255] Step 5:

[0256] The server automatically generates meeting minutes using a generative AI connected via an API, based on the extracted data. These minutes include screenshots from the meeting, along with the data extracted through analysis.

[0257] Step 6:

[0258] The server uses a similar generative AI to create a draft agenda and materials for the next meeting. This draft will be tailored to the content of the meeting.

[0259] Step 7:

[0260] The server notifies users of the generated meeting minutes and draft documents, and provides an editing interface for review and modification. This interface allows users to easily review the content and make necessary corrections.

[0261] Step 8:

[0262] The server retrieves meeting participants' schedule information via a calendar API and automatically adjusts the date for the next meeting. After adjustment, it determines the optimal date and time.

[0263] Step 9:

[0264] The server generates meeting invitations based on the coordinated schedule and automatically distributes them to participants. At the same time, it ensures that the meeting date is properly reflected in each participant's calendar.

[0265] (Example 1)

[0266] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0267] Current online meeting systems have several drawbacks, including the significant time and effort required for post-meeting administrative tasks and the difficulty in efficiently extracting important information during meetings. Furthermore, scheduling subsequent meetings must be done manually, adding to the burden on participants. These challenges can reduce the effectiveness of meetings and impair productivity.

[0268] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0269] In this invention, the server includes means for detecting the end of an online meeting, means for acquiring meeting recording information and converting audio to text data, and means for analyzing the text data to generate a meeting record. This makes it possible to generate a meeting record effectively and quickly after the meeting ends and to automatically extract important information. Furthermore, by automatically setting the date of the next meeting based on time management information and providing an editing interface for the generated materials, the burden on participants can be reduced and the productivity of the meeting can be improved.

[0270] An "online meeting" is a virtual meeting conducted via the internet, where participants can communicate with each other remotely through screen and audio.

[0271] "Means for detecting termination" refers to a technical configuration used as a trigger to automatically identify the end of an online meeting and notify the system of the result.

[0272] "Recorded information" refers to audio, video, and other data recorded during online meetings, which are used later as material for analysis and processing.

[0273] "Methods for converting speech to text data" refers to technologies that convert speech signals into text information, and is a process that uses speech recognition technology to output speech content as a string of characters.

[0274] "Methods for generating meeting minutes by analyzing text data" refers to functions and technologies for analyzing text information, organizing important content and decisions discussed at a meeting, and creating meeting minutes.

[0275] "Time management information" refers to information about the schedules and available time of meeting participants, and is used to coordinate the date of the next meeting.

[0276] An "editing interface" refers to the user interface and tools provided to allow users to review and modify generated documents and meeting records.

[0277] This invention is a system that enables efficient management and automated post-meeting processing of online meetings. First, the terminal starts the system when it receives a notification that the online meeting has ended. This notification acts as an automatic trigger from the meeting platform.

[0278] Upon receiving the termination notification, the server immediately retrieves the meeting recording information from the cloud storage service. Common cloud storage services are used in this step, with Amazon S3 being a particularly suitable option for large-scale data management. The server then uses speech recognition technology, such as the Google Cloud Speech-to-Text API, to convert the recorded audio data into text data.

[0279] The converted character data is further analyzed by natural language processing technology to extract important discussions and decisions in the meeting. Natural language processing libraries such as NLTK and spaCy are used for this analysis work. The analyzed information is sent to the generative AI model via an API, and this generative AI model, such as GPT-3, is used to automatically create meeting minutes and a plan for the next meeting.

[0280] The generated records and materials are notified to the user by the server. The user can check the generated products via the provided editing interface and make necessary corrections. This interface is designed to enable the user to easily provide feedback and make corrections.

[0281] The server also refers to the time management information of the participants and automatically sets the schedule for the next meeting. This function is realized by obtaining schedule information through the Microsoft Graph API. Also, together with the selected meeting schedule, the server automatically distributes meeting invitations to the participants. This process includes automatic email sending and creating calendar events.

[0282] As a specific example, after the monthly meeting ends, this system functions to extract important information by linking speech recognition and natural language processing from 1 hour of meeting recording data, generate a meeting memo and a plan for the next meeting within 20 minutes, and quickly distribute them to the relevant parties. It can be confirmed that the administrative process after the meeting is significantly shortened through this process.

[0283] As an example of the prompt text for the generative AI model, it is input in the form of "Create a meeting agenda memo from the following important matters: list of important matters". Thereby, the model generates appropriate meeting minutes.

[0284] The flow of the specific process in Example 1 will be described using FIG. 11.

[0285] Step 1:

[0286] The terminal receives a notification of the end of the online meeting. This operates based on an API call from the meeting management platform. Specifically, it receives a Webhook notification sent by the platform at the end of the meeting, and based on this, the system's processing is initiated. The input is the notification of the end of the meeting, and the output is the trigger for starting the processing.

[0287] Step 2:

[0288] After receiving the notification, the server retrieves the meeting record information from the cloud storage. At this time, it uses the storage API to download the data. For example, it retrieves audio data using the specified file path from the storage system. The input for this step is the data path of the storage, and the output is the retrieved audio data.

[0289] Step 3:

[0290] The server uses speech recognition technology such as Google Cloud Speech-to-Text API with the retrieved audio data to convert the audio into text data. The audio data is sent to the API, and this API analyzes the audio signal to generate text. The input for this step is the audio data, and the output is the text data.

[0291] Step 4:

[0292] The server analyzes the generated text data and uses natural language processing technology to extract important discussions and decisions. Natural language processing libraries such as NLTK and spaCy are used, and key phrases are extracted by syntactic analysis of the text data. The input for this step is the text data, and the output is a list of important matters.

[0293] Step 5:

[0294] The server uses the extracted key points to send prompts to the generative AI model, which then generates meeting minutes and a plan for the next meeting. Based on these prompts, the generative AI model performs natural language generation to construct appropriate content. The input for this step is a list of key points, and the output is meeting minutes and a meeting plan.

[0295] Step 6:

[0296] The server notifies the user of the generated meeting minutes and meeting plan. Notifications are sent via email or a notification service, and users can review and edit these generated documents through a web-based editing interface. The input for this step is the meeting minutes and meeting plan, and the output is the notification to the user.

[0297] Step 7:

[0298] The server automatically schedules the next meeting based on participants' time management information. It uses the Microsoft Graph API to retrieve participants' calendar information and scales optimally. The input for this step is participants' calendar information, and the output is the date of the next meeting.

[0299] Step 8:

[0300] The server automatically distributes invitations based on the determined meeting schedule. It uses an email system or calendar API to send invitations to participants and automatically creates calendar events. The input for this step is the meeting schedule, and the output is the invitation sent to participants.

[0301] (Application Example 1)

[0302] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0303] Modern information exchanges and meetings are frequent and voluminous, and participants are often overwhelmed with post-meeting information organization and preparation for the next session. Therefore, there is a need to reduce the time and effort required for efficient post-meeting information processing and next session preparation. Additionally, it is difficult to accurately capture important moments during meetings, and there is an issue that many people struggle with creating minutes and formulating plans for the next session.

[0304] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0305] In this invention, the server includes means for detecting the end of online information exchange, means for capturing recorded data of information and converting sound into text, and generating means for analyzing the text data to generate a meeting summary. As a result, post-meeting information processing is automated, enabling participants to easily grasp the content of the meeting and quickly formulate plans for the next session.

[0306] "Online information exchange" refers to meetings and consultations conducted via the Internet, which is a venue for participants to share information and conduct discussions without direct face-to-face interaction.

[0307] "Recorded data of information" refers to all digital-formatted recorded information such as sound, video, and materials obtained during meetings or information exchanges.

[0308] "Converting sound into text" refers to the process of converting audio data into text data using speech recognition technology.

[0309] "Generating means" refers to technologies and devices for automatically generating new information based on specific data within an information processing system.

[0310] "Analyzing text data" refers to the operation of extracting meaning from text information or finding important points using natural language processing technology.

[0311] A "meeting summary" is information that summarizes the content discussed at a meeting and highlights key points, allowing participants to quickly understand the meeting's content.

[0312] "Automatically sending notifications" refers to a system where information or messages are automatically sent to users based on set conditions.

[0313] "Acquiring visual information" refers to the technology of capturing important moments as images or videos, which is used to review key parts of a meeting later.

[0314] The system of this invention is designed to improve the efficiency of online information exchange. Its main components include a terminal for detecting the end of online information exchange, a server for storing recorded information data in cloud storage and making it accessible later, and means for converting audio data into text data and generating automatically generated meeting summaries and materials for planning the next meeting.

[0315] First, the terminal detects that the information exchange has ended and transfers the recorded data to the cloud. The server uses the Google Cloud Speech-to-Text API to convert the audio data into text data. Next, IBM Watson Natural Language Understanding is used to extract important discussions and decisions from the converted text data. Then, using OpenAI's GPT model, a meeting summary and materials for planning the next meeting are automatically generated based on the extracted information. This generated document is delivered to the user via notification, allowing the user to immediately review the content and edit or provide feedback as needed.

[0316] As a concrete example, imagine a scenario where, after a fintech company finishes an online meeting, a new investment strategy discussed during the meeting is detected, a generated memo is automatically created in just 5 minutes, and then delivered to all members via push notification. Operating this system would facilitate rapid information sharing and significantly improve the efficiency of individual post-meeting tasks.

[0317] Examples of input prompts for a generative AI model include the following:

[0318] "This is a text summarizing the key strategies and policies discussed during the information exchange. Please use this to create a meeting summary and generate a draft agenda for the next meeting."

[0319] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0320] Step 1:

[0321] The terminal detects the end of the online information exchange. Upon receiving the termination detection trigger, the terminal automatically prepares to upload the recorded data collected during the information exchange to cloud storage. The input to this step is the trigger for the end of the meeting, and the output is the recorded data stored in the cloud.

[0322] Step 2:

[0323] The server receives the recorded data stored in the cloud. The server uses the Google Cloud Speech-to-Text API to process the audio data and convert the audio data into text data. The input is the recorded data, and the output is the text data.

[0324] Step 3:

[0325] The server analyzes the converted character data using IBM Watson Natural Language Understanding. Here, important discussions and decisions are extracted. The input is character data, and the output is a list of important discussions and decisions. This data processing is performed using natural language processing techniques.

[0326] Step 4:

[0327] The server uses the extracted information to create a meeting summary and planning materials for the next meeting using OpenAI's GPT model. Here, the generative AI model generates a new document based on the input prompt text. The input is the extracted information, and the output is the automatically generated text.

[0328] Step 5:

[0329] The server sends the generated meeting summary and planning materials to the user via push notification. The user can review the content and provide feedback through the received notification. The input is the generated document, and the output is the notification to the user. This operation allows the user to quickly review the content and plan their next actions.

[0330] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0331] One embodiment of the present invention relates to an online meeting management system incorporating an emotion engine. This system comprehensively streamlines the process from initiating a meeting to completing administrative tasks afterward, and utilizes user emotion data.

[0332] First, the device detects the end of the online meeting and sends that information to the server. Upon receiving this notification, the server accesses the emotion engine to analyze the user's speech and facial expression data collected during the meeting. Based on this analysis, it identifies the emotional state and integrates the results with other data.

[0333] Using this emotional information, the server extracts key points from the meeting content, which has been converted into text data using speech recognition technology. In this point extraction process, the server generates meeting minutes that take the emotional data into consideration, highlighting areas where emotional changes are particularly significant, thereby providing participants with useful information.

[0334] Furthermore, emotional data is reflected in the creation of the agenda and draft materials for the next meeting. Based on the user's emotional state, the server suggests a less stressful schedule and agenda items that facilitate dialogue. This helps to improve participants' work efficiency and create a better meeting environment.

[0335] For example, if the emotion engine detects a point where a user's emotions change negatively, the server uses that information to suggest improvements to the agenda and follow-up for the next meeting. This makes it easier for participants' opinions to be reflected, leading to an overall increase in satisfaction.

[0336] Thus, the present invention can improve the quality of online meetings and streamline their operation through the use of data by an emotion engine.

[0337] The following describes the processing flow.

[0338] Step 1:

[0339] The device prepares to send user speech and facial expression data to the emotion engine as soon as the meeting starts. This data transmission takes place in real time, continuously recording the user's emotional state.

[0340] Step 2:

[0341] After detecting the end of an online meeting, the server retrieves sentiment data collected along with the video recording. The video recording includes audio and video, while the sentiment data records information based on the user's facial expressions and tone of voice.

[0342] Step 3:

[0343] The server converts the recorded audio data into text data using a speech recognition system. Simultaneously, an emotion engine analyzes the user's emotional changes and adds the results to the text data as emotion tags.

[0344] Step 4:

[0345] The server analyzes the text data using natural language processing to extract key points of the agenda and sections related to different emotional states. This process incorporates emotion tag information and interprets the data in a way that takes user responses into account.

[0346] Step 5:

[0347] The server automatically generates meeting minutes using a generation AI based on the extracted data. These minutes highlight areas where there were particularly significant emotional shifts, serving as a reference for participants when reviewing the discussion.

[0348] Step 6:

[0349] The server creates the agenda and draft materials for the next meeting, taking emotional data into consideration. Based on the user's emotional state, the content is structured to propose productive topics while reducing stress.

[0350] Step 7:

[0351] The server notifies users of the generated meeting minutes and materials for the next meeting, and provides an interface for reviewing and editing the content. This interface also allows for sentiment-based feedback.

[0352] Step 8:

[0353] The server combines meeting participants' schedules and sentiment data to adjust the date for the next meeting. This automatically sets a schedule that reduces the burden on participants.

[0354] Step 9:

[0355] The server automatically sends invitations to participants based on the decided meeting schedule. These invitations also include emotionally charged notes and preparation requirements.

[0356] (Example 2)

[0357] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0358] Online meetings present challenges in accurately grasping the content and using it for future meetings, as a large amount of information is exchanged simultaneously. Furthermore, it's crucial to appropriately capture participants' emotions to enhance the effectiveness of future meetings. Addressing these challenges requires automated analysis of meeting content and the generation of meeting minutes and agendas that reflect participants' emotional states.

[0359] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0360] In this invention, the server includes means for detecting the end of an online meeting, means for capturing recording data of the meeting and converting audio to text, means for identifying the user's emotional state using an emotion engine, means for analyzing text data and emotion data to generate meeting minutes, and means for automatically generating an agenda and materials for the next meeting, taking the emotion data into consideration. This enables more efficient post-meeting administrative work and more effective meeting management based on the participants' emotions.

[0361] An "online meeting" is a gathering in which participants in multiple locations share audio and video in real time via the internet to communicate.

[0362] A "means for detecting termination" refers to a mechanism that identifies the termination status of an online meeting and notifies the system of that information.

[0363] "Recorded data" refers to digital data that records the audio and video exchanges that took place during an online meeting.

[0364] "Means of converting speech to text" refers to a technology or process for converting recorded speech data into textual information.

[0365] An "emotion engine" is an algorithm or software that analyzes a user's pronunciation and facial expressions to identify their emotional state.

[0366] "Text data" refers to a data format that contains string information obtained by converting speech into text.

[0367] "Methods for generating meeting minutes" refers to a function that automatically creates notes summarizing the meeting content, extracting the important points.

[0368] "Emotional data" refers to digital data that indicates the emotional state of meeting participants, derived from their statements and facial expressions.

[0369] An "agenda" is a plan that lists the items and topics that are scheduled to be discussed at the next meeting.

[0370] "Means for automatically generating documents" refers to technologies or systems that automatically create documents summarizing important information and proposals based on the content of discussions.

[0371] A "schedule" refers to the availability of meeting participants and is used to coordinate the date of the next meeting.

[0372] This invention aims to improve the efficiency and quality of meetings by using an online meeting management system that incorporates an emotion engine. The system is primarily operated through the cooperation of a server, terminals, and users.

[0373] First, the device uses an online meeting application (e.g., a common online meeting platform) to monitor the start and end of the meeting. Once the meeting ends, it sends that information to the server.

[0374] Next, the server starts operating triggered by the termination notification. The server retrieves the audio data of the recorded meeting and uses speech recognition technology (e.g., a common speech-to-text service) to convert the audio data into text data. In addition, the server uses an emotion engine to analyze the user's statements and facial expressions during the meeting to identify the user's emotional state. A common facial expression recognition service is used in this analysis process.

[0375] Text and sentiment data are integrated by the server and analyzed by a meeting minutes generation engine. This engine uses natural language processing techniques to extract key points and areas of significant emotional shifts during the meeting. This automatically generates meeting minutes that provide useful information for participants.

[0376] Furthermore, the server automatically generates the agenda and materials for the next meeting. The server considers emotional data and suggests topics to ensure participants can engage in dialogue without feeling stressed. For example, if a topic is identified with a high level of negative emotion, it will add suggestions for improvement and follow-up to the agenda.

[0377] For example, if the emotion engine detects an increase in negative emotions during a meeting based on user speech data and facial expression data, the server will use this information to generate suggestions for improving the agenda for the next meeting. An example of a prompt for this process would be, "When participants' negative emotions increased during the meeting, please generate a proposal for the next meeting agenda that takes this into account."

[0378] Therefore, by utilizing emotional data, this system can streamline post-meeting processing for online meetings and provide a meeting environment that promotes smooth communication among participants.

[0379] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0380] Step 1:

[0381] The device monitors the start and end of online meetings in real time. When a meeting ends, the device generates termination notification data, including the meeting ID, end time, and a list of participants, and sends it to the server.

[0382] Step 2:

[0383] The server, triggered by the received termination notification, retrieves the corresponding meeting recording data from the meeting database. The server then uses speech recognition technology to convert this audio data into text data. Specifically, the server processes the audio input using a natural language processing algorithm to generate a text-based conversation log.

[0384] Step 3:

[0385] User speech and facial expression data are captured during the meeting via dedicated sensors and cameras and sent to a server. The server uses an emotion engine to analyze this data and identify each user's emotional state. In this analysis process, the content of speech and facial expressions are used as input, and positive, negative, or neutral emotion data is generated as output.

[0386] Step 4:

[0387] The server integrates text data and sentiment data and inputs it into a meeting minutes generation engine. This engine applies an algorithm that extracts important discussion points and areas of significant emotional shifts. The resulting meeting minutes clearly indicate key phrases and emotional fluctuations during the meeting.

[0388] Step 5:

[0389] The server automatically generates the agenda and materials for the next meeting based on the generated sentiment data. For agenda items where sentiment changed significantly, improvement measures and follow-ups are added to the agenda. The previous sentiment data and meeting minutes are used as input, and the proposed agenda is generated as output.

[0390] Step 6:

[0391] The server distributes the generated meeting minutes and agenda to each participant. The distribution process sends the documents to designated email addresses or individual folders in cloud storage. This allows all participants to prepare for the next meeting.

[0392] (Application Example 2)

[0393] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0394] Online communication presents challenges in providing immediate feedback and customized content based on participants' emotions and states. Furthermore, mechanisms for effectively and efficiently recording meeting content and incorporating it into future meetings are often inadequate. To address these challenges, real-time sentiment analysis and content customization are essential.

[0395] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0396] In this invention, the server includes means for taking in recorded data and converting audio information into text information, means for analyzing the emotional state of participants and providing customized content, and means for automatically scheduling the next communication based on the participants' schedules. This enables immediate content tagging based on the emotional state of participants and the automation of optimal planning for the next meeting.

[0397] "Online communication" is a technology that allows multiple participants in remote locations to exchange information and conduct conversations and meetings in real time via the internet.

[0398] "Recorded data" refers to data that stores information such as audio, video, and text generated during online communication.

[0399] "Audio information" refers to the words and sounds uttered by participants during online communication, and is the information that is recorded and analyzed.

[0400] "Textual information" refers to data obtained by converting audio information into text format, and it forms the basis of meeting minutes and reports.

[0401] "Participant emotional state" refers to the results of measuring participants' psychological responses in real time during online communication and analyzing the changes in those responses.

[0402] "Customized content" refers to information and suggestions that are individually optimized based on the participant's emotional state and past data.

[0403] "Content tagging" is a technique that assigns identifiers to specific information or scenes at the peak of participants' emotional states, making them useful for later reference and analysis.

[0404] "Meeting content" refers to the overall information, including statements, discussions, and resolutions exchanged during online communication.

[0405] "Automatic scheduling" refers to a function where the system automatically determines the dates for meetings and communications, taking into account the participants' schedules.

[0406] "The optimal plan for the next meeting" refers to the agenda and materials for the next meeting, which are formulated based on participant sentiment analysis and the content of the previous meeting, with the aim of reducing stress and improving efficiency.

[0407] The embodiment of this invention relates to a content delivery system that utilizes emotion analysis during online communication. Specifically, it enables the provision of individually optimized information by analyzing audio and video data in real time and understanding the emotional state of participants. This system primarily operates through cooperation between a server, a terminal, and a user.

[0408] The server receives recorded data of online communications and uses speech recognition technology to convert speech information into text information. It is recommended to use advanced models such as "OpenAI Whisper" for speech recognition. This allows the communication content to be saved as text data and analyzed.

[0409] Next, the device uses video data to analyze the participants' emotional state in real time. The emotion analysis incorporates "Emotion AI" technology to identify changes in emotion from the video data. This resulting emotional data is then sent to a server and stored as analysis results.

[0410] Users receive customized information provided by the system. This is automatically generated content based on participants' past reactions and key points of discussion. For example, the moments that evoked the most emotional responses from users are tagged, and similar information is recommended later.

[0411] One concrete application of this system is to detect a user's smile while watching an emotionally moving movie and identify the scene in question. Based on the user's preferences, related movies and entertainment are then recommended. An example of a prompt message to input into the generating AI model would be, "Analyze the user's reactions and identify the scene with the strongest response. We would like to use this for future content recommendations."

[0412] In this way, this invention aims to improve the communication experience by utilizing the emotional state of participants in online communications and providing more personalized information.

[0413] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0414] Step 1:

[0415] The device uses a microphone and camera to capture participant audio and video data during online communication. Input is real-time audio and video, while output is digital data for recording. This data is immediately transferred to the server. Specifically, audio is saved in .wav format and video in .mp4 format.

[0416] Step 2:

[0417] The server inputs the received audio data into the speech recognition model "OpenAI Whisper" and converts it into text information. The input is audio data in a .wav file, and the output is the converted text data. Preprocessing such as noise reduction and speaker separation is performed at this stage, which improves the accuracy of the analysis.

[0418] Step 3:

[0419] The device inputs received video data into "Emotion AI," which analyzes the emotional state of participants based on their facial expressions. The input is video data in .mp4 format, and the output is numerical data indicating the emotional category (e.g., joy, interest, surprise) and its intensity. This data is used to track emotional changes in real time.

[0420] Step 4:

[0421] The server integrates the audio text data and sentiment analysis results to perform content tagging. The input is the text data and sentiment data up to this step, and the output is a list of tagged key points. Tagging applies an algorithm that calculates the frequency of verbs and nouns and identifies the moments when emotions peaked.

[0422] Step 5:

[0423] The user receives content generated by the server and recommendations for the next session. The input is tagged points and a list of content recommendations sent from the server, and the output is customized feedback that the user views. For example, the user might receive a customized list that includes trailers for relevant movies.

[0424] Step 6:

[0425] The server uses user feedback and viewing history to input prompts into a generating AI model to optimize future recommendations. The input consists of user feedback data and pre-prepared prompts, while the output is updated information for the next recommendation items. An example of a prompt is, "Analyze the user's reactions and identify the scenes that elicited the strongest responses. We would like to use this information for future content recommendations."

[0426] Through this process, the system can provide personalized information based on participants' emotions, thereby improving the online communication experience.

[0427] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0428] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0429] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0430] [Third Embodiment]

[0431] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0432] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0433] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0434] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0435] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0436] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0437] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0438] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0439] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0440] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0441] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0442] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0443] Specific embodiments for carrying out the present invention are shown below. This system aims to efficiently manage online meetings and automate post-meeting administrative tasks, thereby reducing the burden on participants and organizers.

[0444] First, the terminal starts running this program upon receiving a termination notification from the online meeting system. Upon receiving the termination notification, the server automatically retrieves the meeting recording data from the designated storage. The retrieved audio data is then converted into text data using speech recognition technology.

[0445] Next, the server analyzes the text data to extract key discussions and decisions from the meeting. This analysis uses natural language processing technology, and based on the extracted data, a generative AI connected via the API automatically generates meeting minutes. Furthermore, it similarly generates a draft agenda and materials for the next meeting.

[0446] The generated meeting minutes and materials are notified to the user from the server, and the user can review and modify the content through the provided editing interface. This interface is designed to facilitate feedback on the generated content.

[0447] The server also references participants' schedule information and automatically sets the date for the next meeting. It presents candidate dates, facilitates coordination among participants as needed, and selects the most suitable date and time. After selection, meeting invitations are automatically sent to each participant.

[0448] As a concrete example, after the monthly meeting ends, this system is activated, and important points are detected from the one-hour recorded meeting data, transcribed, and meeting minutes and a draft agenda for the next meeting are automatically generated within 20 minutes and distributed to relevant parties. Through this process, the administrative process after the meeting is significantly streamlined.

[0449] The following describes the processing flow.

[0450] Step 1:

[0451] The device detects the end of the online meeting and sends a signal to the server to send a notification. This signal allows the program to recognize that the meeting has ended.

[0452] Step 2:

[0453] When the server receives a termination notification from the terminal, it automatically begins retrieving the meeting recording data from the designated storage. This recording data includes both audio and video.

[0454] Step 3:

[0455] The server sends the captured audio data to a speech recognition system, which transcribes it into text data. High-precision speech recognition technology is used in this process.

[0456] Step 4:

[0457] The server analyzes the generated text data through a natural language processing engine and automatically extracts important discussion points and decisions.

[0458] Step 5:

[0459] The server automatically generates meeting minutes using a generative AI connected via an API, based on the extracted data. These minutes include screenshots from the meeting, along with the data extracted through analysis.

[0460] Step 6:

[0461] The server uses a similar generative AI to create a draft agenda and materials for the next meeting. This draft will be tailored to the content of the meeting.

[0462] Step 7:

[0463] The server notifies users of the generated meeting minutes and draft documents, and provides an editing interface for review and modification. This interface allows users to easily review the content and make necessary corrections.

[0464] Step 8:

[0465] The server retrieves meeting participants' schedule information via a calendar API and automatically adjusts the date for the next meeting. After adjustment, it determines the optimal date and time.

[0466] Step 9:

[0467] The server generates meeting invitations based on the coordinated schedule and automatically distributes them to participants. At the same time, it ensures that the meeting date is properly reflected in each participant's calendar.

[0468] (Example 1)

[0469] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0470] Current online meeting systems have several drawbacks, including the significant time and effort required for post-meeting administrative tasks and the difficulty in efficiently extracting important information during meetings. Furthermore, scheduling subsequent meetings must be done manually, adding to the burden on participants. These challenges can reduce the effectiveness of meetings and impair productivity.

[0471] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0472] In this invention, the server includes means for detecting the end of an online meeting, means for acquiring meeting recording information and converting audio to text data, and means for analyzing the text data to generate a meeting record. This makes it possible to generate a meeting record effectively and quickly after the meeting ends and to automatically extract important information. Furthermore, by automatically setting the date of the next meeting based on time management information and providing an editing interface for the generated materials, the burden on participants can be reduced and the productivity of the meeting can be improved.

[0473] An "online meeting" is a virtual meeting conducted via the internet, where participants can communicate with each other remotely through screen and audio.

[0474] "Means for detecting termination" refers to a technical configuration used as a trigger to automatically identify the end of an online meeting and notify the system of the result.

[0475] "Recorded information" refers to audio, video, and other data recorded during online meetings, which are used later as material for analysis and processing.

[0476] "Methods for converting speech to text data" refers to technologies that convert speech signals into text information, and is a process that uses speech recognition technology to output speech content as a string of characters.

[0477] "Methods for generating meeting minutes by analyzing text data" refers to functions and technologies for analyzing text information, organizing important content and decisions discussed at a meeting, and creating meeting minutes.

[0478] "Time management information" refers to information about the schedules and available time of meeting participants, and is used to coordinate the date of the next meeting.

[0479] An "editing interface" refers to the user interface and tools provided to allow users to review and modify generated documents and meeting records.

[0480] This invention is a system that enables efficient management and automated post-meeting processing of online meetings. First, the terminal starts the system when it receives a notification that the online meeting has ended. This notification acts as an automatic trigger from the meeting platform.

[0481] Upon receiving the termination notification, the server immediately retrieves the meeting recording information from the cloud storage service. Common cloud storage services are used in this step, with Amazon S3 being a particularly suitable option for large-scale data management. The server then uses speech recognition technology, such as the Google Cloud Speech-to-Text API, to convert the recorded audio data into text data.

[0482] The converted text data is further analyzed using natural language processing techniques to extract important discussions and decisions from the meeting. Natural language processing libraries such as NLTK and spaCy are used for this analysis. The analyzed information is sent to a generative AI model via an API, and this model, such as GPT-3, is used to automatically generate meeting minutes and plans for future meetings.

[0483] The generated records and materials are notified to the user by the server. The user can review the output through the provided editing interface and make corrections as needed. This interface is designed to allow users to easily provide feedback and make revisions.

[0484] The server also references participants' time management information and automatically schedules the next meeting. This functionality is achieved by retrieving schedule information via the Microsoft Graph API. Along with the selected meeting date, the server automatically distributes meeting invitations to participants. This process includes automated email sending and the creation of calendar events.

[0485] As a concrete example, this system functions after the monthly meeting, using speech recognition and natural language processing to extract key information from the one-hour meeting recording. Meeting minutes and a draft plan for the next meeting are generated within 20 minutes and quickly distributed to relevant parties. This process significantly shortens the administrative work required after the meeting.

[0486] An example of a prompt for a generative AI model would be: "Create meeting minutes from the following key points: List of key points." This would allow the model to generate appropriate meeting minutes.

[0487] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0488] Step 1:

[0489] The terminal receives a notification that an online meeting has ended. This operates based on an API call from the meeting management platform. Specifically, it receives a webhook notification sent by the platform when the meeting ends, and the system initiates processing based on that notification. The input is the meeting end notification, and the output is the trigger for initiating processing.

[0490] Step 2:

[0491] After receiving a notification, the server retrieves meeting recording information from cloud storage. This involves downloading data using the storage API. For example, it might retrieve audio data using a specified file path from the storage system. The input for this step is the data path in storage, and the output is the retrieved audio data.

[0492] Step 3:

[0493] The server uses the acquired audio data to convert speech into text data using speech recognition technologies such as the Google Cloud Speech-to-Text API. The audio data is sent to the API, which analyzes the audio signal and generates text. The input for this step is audio data, and the output is text data.

[0494] Step 4:

[0495] The server analyzes the generated text data and extracts important discussions and decisions using natural language processing techniques. Natural language processing libraries such as NLTK and spaCy are used to extract key phrases by parsing the text data. The input for this step is text data, and the output is a list of important points.

[0496] Step 5:

[0497] The server uses the extracted key points to send prompts to the generative AI model, which then generates meeting minutes and a plan for the next meeting. Based on these prompts, the generative AI model performs natural language generation to construct appropriate content. The input for this step is a list of key points, and the output is meeting minutes and a meeting plan.

[0498] Step 6:

[0499] The server notifies the user of the generated meeting minutes and meeting plan. Notifications are sent via email or a notification service, and users can review and edit these generated documents through a web-based editing interface. The input for this step is the meeting minutes and meeting plan, and the output is the notification to the user.

[0500] Step 7:

[0501] The server automatically schedules the next meeting based on participants' time management information. It uses the Microsoft Graph API to retrieve participants' calendar information and scales optimally. The input for this step is participants' calendar information, and the output is the date of the next meeting.

[0502] Step 8:

[0503] The server automatically distributes invitations based on the determined meeting schedule. It uses an email system or calendar API to send invitations to participants and automatically creates calendar events. The input for this step is the meeting schedule, and the output is the invitation sent to participants.

[0504] (Application Example 1)

[0505] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0506] Modern information exchange and meetings are frequent and voluminous, often leaving participants overwhelmed with organizing information after meetings and preparing for future meetings. Therefore, there is a need for efficient post-meeting information processing and a reduction in the time and effort required for future preparation. Furthermore, it is often difficult to accurately capture crucial moments during meetings, leading to challenges in creating meeting minutes and planning for future meetings.

[0507] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0508] In this invention, the server includes means for detecting the end of online information exchange, means for acquiring recorded information data and converting sound to text, and means for analyzing the text data and generating a meeting summary. This automates post-meeting information processing, allowing participants to easily understand the meeting content and quickly plan for the next meeting.

[0509] "Online information exchange" refers to meetings and discussions conducted via the internet, where participants share information and engage in discussions without meeting face-to-face.

[0510] "Information recording data" refers to all digital recording information, such as audio, video, and documents, obtained during meetings and information exchanges.

[0511] "Converting sound to text" refers to the process of converting audio data into text data using speech recognition technology.

[0512] "Generation means" refers to technologies and devices used within an information processing system to automatically generate new information based on specific data.

[0513] "Analyzing text data" refers to the process of extracting meaning from text information or identifying important points using natural language processing techniques.

[0514] A "meeting summary" is information that summarizes the content discussed at a meeting and highlights key points, allowing participants to quickly understand the meeting's content.

[0515] "Automatically sending notifications" refers to a system where information or messages are automatically sent to users based on set conditions.

[0516] "Acquiring visual information" refers to the technology of capturing important moments as images or videos, which is used to review key parts of a meeting later.

[0517] The system of this invention is designed to improve the efficiency of online information exchange. Its main components include a terminal for detecting the end of online information exchange, a server for storing recorded information data in cloud storage and making it accessible later, and means for converting audio data into text data and generating automatically generated meeting summaries and materials for planning the next meeting.

[0518] First, the terminal detects that the information exchange has ended and transfers the recorded data to the cloud. The server uses the Google Cloud Speech-to-Text API to convert the audio data into text data. Next, IBM Watson Natural Language Understanding is used to extract important discussions and decisions from the converted text data. Then, using OpenAI's GPT model, a meeting summary and materials for planning the next meeting are automatically generated based on the extracted information. This generated document is delivered to the user via notification, allowing the user to immediately review the content and edit or provide feedback as needed.

[0519] As a concrete example, imagine a scenario where, after a fintech company finishes an online meeting, a new investment strategy discussed during the meeting is detected, a generated memo is automatically created in just 5 minutes, and then delivered to all members via push notification. Operating this system would facilitate rapid information sharing and significantly improve the efficiency of individual post-meeting tasks.

[0520] Examples of input prompts for a generative AI model include the following:

[0521] "This is a text summarizing the key strategies and policies discussed during the information exchange. Please use this to create a meeting summary and generate a draft agenda for the next meeting."

[0522] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0523] Step 1:

[0524] The terminal detects the end of the online information exchange. Upon receiving the termination detection trigger, the terminal automatically prepares to upload the recorded data collected during the information exchange to cloud storage. The input to this step is the trigger for the end of the meeting, and the output is the recorded data stored in the cloud.

[0525] Step 2:

[0526] The server receives the recorded data stored in the cloud. The server uses the Google Cloud Speech-to-Text API to process the audio data and convert the audio data into text data. The input is the recorded data, and the output is the text data.

[0527] Step 3:

[0528] The server analyzes the converted character data using IBM Watson Natural Language Understanding. Here, important discussions and decisions are extracted. The input is character data, and the output is a list of important discussions and decisions. This data processing is performed using natural language processing techniques.

[0529] Step 4:

[0530] The server uses the extracted information to create a meeting summary and planning materials for the next meeting using OpenAI's GPT model. Here, the generative AI model generates a new document based on the input prompt text. The input is the extracted information, and the output is the automatically generated text.

[0531] Step 5:

[0532] The server sends the generated meeting summary and planning materials to the user via push notification. The user can review the content and provide feedback through the received notification. The input is the generated document, and the output is the notification to the user. This operation allows the user to quickly review the content and plan their next actions.

[0533] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0534] One embodiment of the present invention relates to an online meeting management system incorporating an emotion engine. This system comprehensively streamlines the process from initiating a meeting to completing administrative tasks afterward, and utilizes user emotion data.

[0535] First, the device detects the end of the online meeting and sends that information to the server. Upon receiving this notification, the server accesses the emotion engine to analyze the user's speech and facial expression data collected during the meeting. Based on this analysis, it identifies the emotional state and integrates the results with other data.

[0536] Using this emotional information, the server extracts key points from the meeting content, which has been converted into text data using speech recognition technology. In this point extraction process, the server generates meeting minutes that take the emotional data into consideration, highlighting areas where emotional changes are particularly significant, thereby providing participants with useful information.

[0537] Furthermore, emotional data is reflected in the creation of the agenda and draft materials for the next meeting. Based on the user's emotional state, the server suggests a less stressful schedule and agenda items that facilitate dialogue. This helps to improve participants' work efficiency and create a better meeting environment.

[0538] For example, if the emotion engine detects a point where a user's emotions change negatively, the server uses that information to suggest improvements to the agenda and follow-up for the next meeting. This makes it easier for participants' opinions to be reflected, leading to an overall increase in satisfaction.

[0539] Thus, the present invention can improve the quality of online meetings and streamline their operation through the use of data by an emotion engine.

[0540] The following describes the processing flow.

[0541] Step 1:

[0542] The device prepares to send user speech and facial expression data to the emotion engine as soon as the meeting starts. This data transmission takes place in real time, continuously recording the user's emotional state.

[0543] Step 2:

[0544] After detecting the end of an online meeting, the server retrieves sentiment data collected along with the video recording. The video recording includes audio and video, while the sentiment data records information based on the user's facial expressions and tone of voice.

[0545] Step 3:

[0546] The server converts the recorded audio data into text data using a speech recognition system. Simultaneously, an emotion engine analyzes the user's emotional changes and adds the results to the text data as emotion tags.

[0547] Step 4:

[0548] The server analyzes the text data using natural language processing to extract key points of the agenda and sections related to different emotional states. This process incorporates emotion tag information and interprets the data in a way that takes user responses into account.

[0549] Step 5:

[0550] The server automatically generates meeting minutes using a generation AI based on the extracted data. These minutes highlight areas where there were particularly significant emotional shifts, serving as a reference for participants when reviewing the discussion.

[0551] Step 6:

[0552] The server creates the agenda and draft materials for the next meeting, taking emotional data into consideration. Based on the user's emotional state, the content is structured to propose productive topics while reducing stress.

[0553] Step 7:

[0554] The server notifies users of the generated meeting minutes and materials for the next meeting, and provides an interface for reviewing and editing the content. This interface also allows for sentiment-based feedback.

[0555] Step 8:

[0556] The server combines meeting participants' schedules and sentiment data to adjust the date for the next meeting. This automatically sets a schedule that reduces the burden on participants.

[0557] Step 9:

[0558] The server automatically sends invitations to participants based on the decided meeting schedule. These invitations also include emotionally charged notes and preparation requirements.

[0559] (Example 2)

[0560] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0561] Online meetings present challenges in accurately grasping the content and using it for future meetings, as a large amount of information is exchanged simultaneously. Furthermore, it's crucial to appropriately capture participants' emotions to enhance the effectiveness of future meetings. Addressing these challenges requires automated analysis of meeting content and the generation of meeting minutes and agendas that reflect participants' emotional states.

[0562] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0563] In this invention, the server includes means for detecting the end of an online meeting, means for capturing recording data of the meeting and converting audio to text, means for identifying the user's emotional state using an emotion engine, means for analyzing text data and emotion data to generate meeting minutes, and means for automatically generating an agenda and materials for the next meeting, taking the emotion data into consideration. This enables more efficient post-meeting administrative work and more effective meeting management based on the participants' emotions.

[0564] An "online meeting" is a gathering in which participants in multiple locations share audio and video in real time via the internet to communicate.

[0565] A "means for detecting termination" refers to a mechanism that identifies the termination status of an online meeting and notifies the system of that information.

[0566] "Recorded data" refers to digital data that records the audio and video exchanges that took place during an online meeting.

[0567] "Means of converting speech to text" refers to a technology or process for converting recorded speech data into textual information.

[0568] An "emotion engine" is an algorithm or software that analyzes a user's pronunciation and facial expressions to identify their emotional state.

[0569] "Text data" refers to a data format that contains string information obtained by converting speech into text.

[0570] "Methods for generating meeting minutes" refers to a function that automatically creates notes summarizing the meeting content, extracting the important points.

[0571] "Emotional data" refers to digital data that indicates the emotional state of meeting participants, derived from their statements and facial expressions.

[0572] An "agenda" is a plan that lists the items and topics that are scheduled to be discussed at the next meeting.

[0573] "Means for automatically generating documents" refers to technologies or systems that automatically create documents summarizing important information and proposals based on the content of discussions.

[0574] A "schedule" refers to the availability of meeting participants and is used to coordinate the date of the next meeting.

[0575] This invention aims to improve the efficiency and quality of meetings by using an online meeting management system that incorporates an emotion engine. The system is primarily operated through the cooperation of a server, terminals, and users.

[0576] First, the device uses an online meeting application (e.g., a common online meeting platform) to monitor the start and end of the meeting. Once the meeting ends, it sends that information to the server.

[0577] Next, the server starts operating triggered by the termination notification. The server retrieves the audio data of the recorded meeting and uses speech recognition technology (e.g., a common speech-to-text service) to convert the audio data into text data. In addition, the server uses an emotion engine to analyze the user's statements and facial expressions during the meeting to identify the user's emotional state. A common facial expression recognition service is used in this analysis process.

[0578] Text and sentiment data are integrated by the server and analyzed by a meeting minutes generation engine. This engine uses natural language processing techniques to extract key points and areas of significant emotional shifts during the meeting. This automatically generates meeting minutes that provide useful information for participants.

[0579] Furthermore, the server automatically generates the agenda and materials for the next meeting. The server considers emotional data and suggests topics to ensure participants can engage in dialogue without feeling stressed. For example, if a topic is identified with a high level of negative emotion, it will add suggestions for improvement and follow-up to the agenda.

[0580] For example, if the emotion engine detects an increase in negative emotions during a meeting based on user speech data and facial expression data, the server will use this information to generate suggestions for improving the agenda for the next meeting. An example of a prompt for this process would be, "When participants' negative emotions increased during the meeting, please generate a proposal for the next meeting agenda that takes this into account."

[0581] Therefore, by utilizing emotional data, this system can streamline post-meeting processing for online meetings and provide a meeting environment that promotes smooth communication among participants.

[0582] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0583] Step 1:

[0584] The device monitors the start and end of online meetings in real time. When a meeting ends, the device generates termination notification data, including the meeting ID, end time, and a list of participants, and sends it to the server.

[0585] Step 2:

[0586] The server, triggered by the received termination notification, retrieves the corresponding meeting recording data from the meeting database. The server then uses speech recognition technology to convert this audio data into text data. Specifically, the server processes the audio input using a natural language processing algorithm to generate a text-based conversation log.

[0587] Step 3:

[0588] User speech and facial expression data are captured during the meeting via dedicated sensors and cameras and sent to a server. The server uses an emotion engine to analyze this data and identify each user's emotional state. In this analysis process, the content of speech and facial expressions are used as input, and positive, negative, or neutral emotion data is generated as output.

[0589] Step 4:

[0590] The server integrates text data and sentiment data and inputs it into a meeting minutes generation engine. This engine applies an algorithm that extracts important discussion points and areas of significant emotional shifts. The resulting meeting minutes clearly indicate key phrases and emotional fluctuations during the meeting.

[0591] Step 5:

[0592] The server automatically generates the agenda and materials for the next meeting based on the generated sentiment data. For agenda items where sentiment changed significantly, improvement measures and follow-ups are added to the agenda. The previous sentiment data and meeting minutes are used as input, and the proposed agenda is generated as output.

[0593] Step 6:

[0594] The server distributes the generated meeting minutes and agenda to each participant. The distribution process sends the documents to designated email addresses or individual folders in cloud storage. This allows all participants to prepare for the next meeting.

[0595] (Application Example 2)

[0596] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0597] Online communication presents challenges in providing immediate feedback and customized content based on participants' emotions and states. Furthermore, mechanisms for effectively and efficiently recording meeting content and incorporating it into future meetings are often inadequate. To address these challenges, real-time sentiment analysis and content customization are essential.

[0598] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0599] In this invention, the server includes means for taking in recorded data and converting audio information into text information, means for analyzing the emotional state of participants and providing customized content, and means for automatically scheduling the next communication based on the participants' schedules. This enables immediate content tagging based on the emotional state of participants and the automation of optimal planning for the next meeting.

[0600] "Online communication" is a technology that allows multiple participants in remote locations to exchange information and conduct conversations and meetings in real time via the internet.

[0601] "Recorded data" refers to data that stores information such as audio, video, and text generated during online communication.

[0602] "Audio information" refers to the words and sounds uttered by participants during online communication, and is the information that is recorded and analyzed.

[0603] "Textual information" refers to data obtained by converting audio information into text format, and it forms the basis of meeting minutes and reports.

[0604] "Participant emotional state" refers to the results of measuring participants' psychological responses in real time during online communication and analyzing the changes in those responses.

[0605] "Customized content" refers to information and suggestions that are individually optimized based on the participant's emotional state and past data.

[0606] "Content tagging" is a technique that assigns identifiers to specific information or scenes at the peak of participants' emotional states, making them useful for later reference and analysis.

[0607] "Meeting content" refers to the overall information, including statements, discussions, and resolutions exchanged during online communication.

[0608] "Automatic scheduling" refers to a function where the system automatically determines the dates for meetings and communications, taking into account the participants' schedules.

[0609] "The optimal plan for the next meeting" refers to the agenda and materials for the next meeting, which are formulated based on participant sentiment analysis and the content of the previous meeting, with the aim of reducing stress and improving efficiency.

[0610] The embodiment of this invention relates to a content delivery system that utilizes emotion analysis during online communication. Specifically, it enables the provision of individually optimized information by analyzing audio and video data in real time and understanding the emotional state of participants. This system primarily operates through cooperation between a server, a terminal, and a user.

[0611] The server receives recorded data of online communications and uses speech recognition technology to convert speech information into text information. It is recommended to use advanced models such as "OpenAI Whisper" for speech recognition. This allows the communication content to be saved as text data and analyzed.

[0612] Next, the device uses video data to analyze the participants' emotional state in real time. The emotion analysis incorporates "Emotion AI" technology to identify changes in emotion from the video data. This resulting emotional data is then sent to a server and stored as analysis results.

[0613] Users receive customized information provided by the system. This is automatically generated content based on participants' past reactions and key points of discussion. For example, the moments that evoked the most emotional responses from users are tagged, and similar information is recommended later.

[0614] One concrete application of this system is to detect a user's smile while watching an emotionally moving movie and identify the scene in question. Based on the user's preferences, related movies and entertainment are then recommended. An example of a prompt message to input into the generating AI model would be, "Analyze the user's reactions and identify the scene with the strongest response. We would like to use this for future content recommendations."

[0615] In this way, this invention aims to improve the communication experience by utilizing the emotional state of participants in online communications and providing more personalized information.

[0616] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0617] Step 1:

[0618] The device uses a microphone and camera to capture participant audio and video data during online communication. Input is real-time audio and video, while output is digital data for recording. This data is immediately transferred to the server. Specifically, audio is saved in .wav format and video in .mp4 format.

[0619] Step 2:

[0620] The server inputs the received audio data into the speech recognition model "OpenAI Whisper" and converts it into text information. The input is audio data in a .wav file, and the output is the converted text data. Preprocessing such as noise reduction and speaker separation is performed at this stage, which improves the accuracy of the analysis.

[0621] Step 3:

[0622] The device inputs received video data into "Emotion AI," which analyzes the emotional state of participants based on their facial expressions. The input is video data in .mp4 format, and the output is numerical data indicating the emotional category (e.g., joy, interest, surprise) and its intensity. This data is used to track emotional changes in real time.

[0623] Step 4:

[0624] The server integrates the audio text data and sentiment analysis results to perform content tagging. The input is the text data and sentiment data up to this step, and the output is a list of tagged key points. Tagging applies an algorithm that calculates the frequency of verbs and nouns and identifies the moments when emotions peaked.

[0625] Step 5:

[0626] The user receives content generated by the server and recommendations for the next session. The input is tagged points and a list of content recommendations sent from the server, and the output is customized feedback that the user views. For example, the user might receive a customized list that includes trailers for relevant movies.

[0627] Step 6:

[0628] The server uses user feedback and viewing history to input prompts into a generating AI model to optimize future recommendations. The input consists of user feedback data and pre-prepared prompts, while the output is updated information for the next recommendation items. An example of a prompt is, "Analyze the user's reactions and identify the scenes that elicited the strongest responses. We would like to use this information for future content recommendations."

[0629] Through this process, the system can provide personalized information based on participants' emotions, thereby improving the online communication experience.

[0630] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0631] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0632] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0633] [Fourth Embodiment]

[0634] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0635] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0636] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0637] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0638] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0639] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0640] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0641] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0642] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0643] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0644] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0645] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0646] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0647] Specific embodiments for carrying out the present invention are shown below. This system aims to efficiently manage online meetings and automate post-meeting administrative tasks, thereby reducing the burden on participants and organizers.

[0648] First, the terminal starts running this program upon receiving a termination notification from the online meeting system. Upon receiving the termination notification, the server automatically retrieves the meeting recording data from the designated storage. The retrieved audio data is then converted into text data using speech recognition technology.

[0649] Next, the server analyzes the text data to extract key discussions and decisions from the meeting. This analysis uses natural language processing technology, and based on the extracted data, a generative AI connected via the API automatically generates meeting minutes. Furthermore, it similarly generates a draft agenda and materials for the next meeting.

[0650] The generated meeting minutes and materials are notified to the user from the server, and the user can review and modify the content through the provided editing interface. This interface is designed to facilitate feedback on the generated content.

[0651] The server also references participants' schedule information and automatically sets the date for the next meeting. It presents candidate dates, facilitates coordination among participants as needed, and selects the most suitable date and time. After selection, meeting invitations are automatically sent to each participant.

[0652] As a concrete example, after the monthly meeting ends, this system is activated, and important points are detected from the one-hour recorded meeting data, transcribed, and meeting minutes and a draft agenda for the next meeting are automatically generated within 20 minutes and distributed to relevant parties. Through this process, the administrative process after the meeting is significantly streamlined.

[0653] The following describes the processing flow.

[0654] Step 1:

[0655] The device detects the end of the online meeting and sends a signal to the server to send a notification. This signal allows the program to recognize that the meeting has ended.

[0656] Step 2:

[0657] When the server receives a termination notification from the terminal, it automatically begins retrieving the meeting recording data from the designated storage. This recording data includes both audio and video.

[0658] Step 3:

[0659] The server sends the captured audio data to a speech recognition system, which transcribes it into text data. High-precision speech recognition technology is used in this process.

[0660] Step 4:

[0661] The server analyzes the generated text data through a natural language processing engine and automatically extracts important discussion points and decisions.

[0662] Step 5:

[0663] The server automatically generates meeting minutes using a generative AI connected via an API, based on the extracted data. These minutes include screenshots from the meeting, along with the data extracted through analysis.

[0664] Step 6:

[0665] The server uses a similar generative AI to create a draft agenda and materials for the next meeting. This draft will be tailored to the content of the meeting.

[0666] Step 7:

[0667] The server notifies users of the generated meeting minutes and draft documents, and provides an editing interface for review and modification. This interface allows users to easily review the content and make necessary corrections.

[0668] Step 8:

[0669] The server retrieves meeting participants' schedule information via a calendar API and automatically adjusts the date for the next meeting. After adjustment, it determines the optimal date and time.

[0670] Step 9:

[0671] The server generates meeting invitations based on the coordinated schedule and automatically distributes them to participants. At the same time, it ensures that the meeting date is properly reflected in each participant's calendar.

[0672] (Example 1)

[0673] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0674] Current online meeting systems have several drawbacks, including the significant time and effort required for post-meeting administrative tasks and the difficulty in efficiently extracting important information during meetings. Furthermore, scheduling subsequent meetings must be done manually, adding to the burden on participants. These challenges can reduce the effectiveness of meetings and impair productivity.

[0675] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0676] In this invention, the server includes means for detecting the end of an online meeting, means for acquiring meeting recording information and converting audio to text data, and means for analyzing the text data to generate a meeting record. This makes it possible to generate a meeting record effectively and quickly after the meeting ends and to automatically extract important information. Furthermore, by automatically setting the date of the next meeting based on time management information and providing an editing interface for the generated materials, the burden on participants can be reduced and the productivity of the meeting can be improved.

[0677] An "online meeting" is a virtual meeting conducted via the internet, where participants can communicate with each other remotely through screen and audio.

[0678] "Means for detecting termination" refers to a technical configuration used as a trigger to automatically identify the end of an online meeting and notify the system of the result.

[0679] "Recorded information" refers to audio, video, and other data recorded during online meetings, which are used later as material for analysis and processing.

[0680] "Methods for converting speech to text data" refers to technologies that convert speech signals into text information, and is a process that uses speech recognition technology to output speech content as a string of characters.

[0681] "Methods for generating meeting minutes by analyzing text data" refers to functions and technologies for analyzing text information, organizing important content and decisions discussed at a meeting, and creating meeting minutes.

[0682] "Time management information" refers to information about the schedules and available time of meeting participants, and is used to coordinate the date of the next meeting.

[0683] An "editing interface" refers to the user interface and tools provided to allow users to review and modify generated documents and meeting records.

[0684] This invention is a system that enables efficient management and automated post-meeting processing of online meetings. First, the terminal starts the system when it receives a notification that the online meeting has ended. This notification acts as an automatic trigger from the meeting platform.

[0685] Upon receiving the termination notification, the server immediately retrieves the meeting recording information from the cloud storage service. Common cloud storage services are used in this step, with Amazon S3 being a particularly suitable option for large-scale data management. The server then uses speech recognition technology, such as the Google Cloud Speech-to-Text API, to convert the recorded audio data into text data.

[0686] The converted text data is further analyzed using natural language processing techniques to extract important discussions and decisions from the meeting. Natural language processing libraries such as NLTK and spaCy are used for this analysis. The analyzed information is sent to a generative AI model via an API, and this model, such as GPT-3, is used to automatically generate meeting minutes and plans for future meetings.

[0687] The generated records and materials are notified to the user by the server. The user can review the output through the provided editing interface and make corrections as needed. This interface is designed to allow users to easily provide feedback and make revisions.

[0688] The server also references participants' time management information and automatically schedules the next meeting. This functionality is achieved by retrieving schedule information via the Microsoft Graph API. Along with the selected meeting date, the server automatically distributes meeting invitations to participants. This process includes automated email sending and the creation of calendar events.

[0689] As a concrete example, this system functions after the monthly meeting, using speech recognition and natural language processing to extract key information from the one-hour meeting recording. Meeting minutes and a draft plan for the next meeting are generated within 20 minutes and quickly distributed to relevant parties. This process significantly shortens the administrative work required after the meeting.

[0690] An example of a prompt for a generative AI model would be: "Create meeting minutes from the following key points: List of key points." This would allow the model to generate appropriate meeting minutes.

[0691] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0692] Step 1:

[0693] The terminal receives a notification that an online meeting has ended. This operates based on an API call from the meeting management platform. Specifically, it receives a webhook notification sent by the platform when the meeting ends, and the system initiates processing based on that notification. The input is the meeting end notification, and the output is the trigger for initiating processing.

[0694] Step 2:

[0695] After receiving a notification, the server retrieves meeting recording information from cloud storage. This involves downloading data using the storage API. For example, it might retrieve audio data using a specified file path from the storage system. The input for this step is the data path in storage, and the output is the retrieved audio data.

[0696] Step 3:

[0697] The server uses the acquired audio data to convert speech into text data using speech recognition technologies such as the Google Cloud Speech-to-Text API. The audio data is sent to the API, which analyzes the audio signal and generates text. The input for this step is audio data, and the output is text data.

[0698] Step 4:

[0699] The server analyzes the generated text data and extracts important discussions and decisions using natural language processing techniques. Natural language processing libraries such as NLTK and spaCy are used to extract key phrases by parsing the text data. The input for this step is text data, and the output is a list of important points.

[0700] Step 5:

[0701] The server uses the extracted key points to send prompts to the generative AI model, which then generates meeting minutes and a plan for the next meeting. Based on these prompts, the generative AI model performs natural language generation to construct appropriate content. The input for this step is a list of key points, and the output is meeting minutes and a meeting plan.

[0702] Step 6:

[0703] The server notifies the user of the generated meeting minutes and meeting plan. Notifications are sent via email or a notification service, and users can review and edit these generated documents through a web-based editing interface. The input for this step is the meeting minutes and meeting plan, and the output is the notification to the user.

[0704] Step 7:

[0705] The server automatically schedules the next meeting based on participants' time management information. It uses the Microsoft Graph API to retrieve participants' calendar information and scales optimally. The input for this step is participants' calendar information, and the output is the date of the next meeting.

[0706] Step 8:

[0707] The server automatically distributes invitations based on the determined meeting schedule. It uses an email system or calendar API to send invitations to participants and automatically creates calendar events. The input for this step is the meeting schedule, and the output is the invitation sent to participants.

[0708] (Application Example 1)

[0709] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0710] Modern information exchange and meetings are frequent and voluminous, often leaving participants overwhelmed with organizing information after meetings and preparing for future meetings. Therefore, there is a need for efficient post-meeting information processing and a reduction in the time and effort required for future preparation. Furthermore, it is often difficult to accurately capture crucial moments during meetings, leading to challenges in creating meeting minutes and planning for future meetings.

[0711] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0712] In this invention, the server includes means for detecting the end of online information exchange, means for acquiring recorded information data and converting sound to text, and means for analyzing the text data and generating a meeting summary. This automates post-meeting information processing, allowing participants to easily understand the meeting content and quickly plan for the next meeting.

[0713] "Online information exchange" refers to meetings and discussions conducted via the internet, where participants share information and engage in discussions without meeting face-to-face.

[0714] "Information recording data" refers to all digital recording information, such as audio, video, and documents, obtained during meetings and information exchanges.

[0715] "Converting sound to text" refers to the process of converting audio data into text data using speech recognition technology.

[0716] "Generation means" refers to technologies and devices used within an information processing system to automatically generate new information based on specific data.

[0717] "Analyzing text data" refers to the process of extracting meaning from text information or identifying important points using natural language processing techniques.

[0718] A "meeting summary" is information that summarizes the content discussed at a meeting and highlights key points, allowing participants to quickly understand the meeting's content.

[0719] "Automatically sending notifications" refers to a system where information or messages are automatically sent to users based on set conditions.

[0720] "Acquiring visual information" refers to the technology of capturing important moments as images or videos, which is used to review key parts of a meeting later.

[0721] The system of this invention is designed to improve the efficiency of online information exchange. Its main components include a terminal for detecting the end of online information exchange, a server for storing recorded information data in cloud storage and making it accessible later, and means for converting audio data into text data and generating automatically generated meeting summaries and materials for planning the next meeting.

[0722] First, the terminal detects that the information exchange has ended and transfers the recorded data to the cloud. The server uses the Google Cloud Speech-to-Text API to convert the audio data into text data. Next, IBM Watson Natural Language Understanding is used to extract important discussions and decisions from the converted text data. Then, using OpenAI's GPT model, a meeting summary and materials for planning the next meeting are automatically generated based on the extracted information. This generated document is delivered to the user via notification, allowing the user to immediately review the content and edit or provide feedback as needed.

[0723] As a concrete example, imagine a scenario where, after a fintech company finishes an online meeting, a new investment strategy discussed during the meeting is detected, a generated memo is automatically created in just 5 minutes, and then delivered to all members via push notification. Operating this system would facilitate rapid information sharing and significantly improve the efficiency of individual post-meeting tasks.

[0724] Examples of input prompts for a generative AI model include the following:

[0725] "This is a text summarizing the key strategies and policies discussed during the information exchange. Please use this to create a meeting summary and generate a draft agenda for the next meeting."

[0726] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0727] Step 1:

[0728] The terminal detects the end of the online information exchange. Upon receiving the termination detection trigger, the terminal automatically prepares to upload the recorded data collected during the information exchange to cloud storage. The input to this step is the trigger for the end of the meeting, and the output is the recorded data stored in the cloud.

[0729] Step 2:

[0730] The server receives the recorded data stored in the cloud. The server uses the Google Cloud Speech-to-Text API to process the audio data and convert the audio data into text data. The input is the recorded data, and the output is the text data.

[0731] Step 3:

[0732] The server analyzes the converted character data using IBM Watson Natural Language Understanding. Here, important discussions and decisions are extracted. The input is character data, and the output is a list of important discussions and decisions. This data processing is performed using natural language processing techniques.

[0733] Step 4:

[0734] The server uses the extracted information to create a meeting summary and planning materials for the next meeting using OpenAI's GPT model. Here, the generative AI model generates a new document based on the input prompt text. The input is the extracted information, and the output is the automatically generated text.

[0735] Step 5:

[0736] The server sends the generated meeting summary and planning materials to the user via push notification. The user can review the content and provide feedback through the received notification. The input is the generated document, and the output is the notification to the user. This operation allows the user to quickly review the content and plan their next actions.

[0737] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0738] One embodiment of the present invention relates to an online meeting management system incorporating an emotion engine. This system comprehensively streamlines the process from initiating a meeting to completing administrative tasks afterward, and utilizes user emotion data.

[0739] First, the device detects the end of the online meeting and sends that information to the server. Upon receiving this notification, the server accesses the emotion engine to analyze the user's speech and facial expression data collected during the meeting. Based on this analysis, it identifies the emotional state and integrates the results with other data.

[0740] Using this emotional information, the server extracts key points from the meeting content, which has been converted into text data using speech recognition technology. In this point extraction process, the server generates meeting minutes that take the emotional data into consideration, highlighting areas where emotional changes are particularly significant, thereby providing participants with useful information.

[0741] Furthermore, emotional data is reflected in the creation of the agenda and draft materials for the next meeting. Based on the user's emotional state, the server suggests a less stressful schedule and agenda items that facilitate dialogue. This helps to improve participants' work efficiency and create a better meeting environment.

[0742] For example, if the emotion engine detects a point where a user's emotions change negatively, the server uses that information to suggest improvements to the agenda and follow-up for the next meeting. This makes it easier for participants' opinions to be reflected, leading to an overall increase in satisfaction.

[0743] Thus, the present invention can improve the quality of online meetings and streamline their operation through the use of data by an emotion engine.

[0744] The following describes the processing flow.

[0745] Step 1:

[0746] The device prepares to send user speech and facial expression data to the emotion engine as soon as the meeting starts. This data transmission takes place in real time, continuously recording the user's emotional state.

[0747] Step 2:

[0748] After detecting the end of an online meeting, the server retrieves sentiment data collected along with the video recording. The video recording includes audio and video, while the sentiment data records information based on the user's facial expressions and tone of voice.

[0749] Step 3:

[0750] The server converts the recorded audio data into text data using a speech recognition system. Simultaneously, an emotion engine analyzes the user's emotional changes and adds the results to the text data as emotion tags.

[0751] Step 4:

[0752] The server analyzes the text data using natural language processing to extract key points of the agenda and sections related to different emotional states. This process incorporates emotion tag information and interprets the data in a way that takes user responses into account.

[0753] Step 5:

[0754] The server automatically generates meeting minutes using a generation AI based on the extracted data. These minutes highlight areas where there were particularly significant emotional shifts, serving as a reference for participants when reviewing the discussion.

[0755] Step 6:

[0756] The server creates the agenda and draft materials for the next meeting, taking emotional data into consideration. Based on the user's emotional state, the content is structured to propose productive topics while reducing stress.

[0757] Step 7:

[0758] The server notifies users of the generated meeting minutes and materials for the next meeting, and provides an interface for reviewing and editing the content. This interface also allows for sentiment-based feedback.

[0759] Step 8:

[0760] The server combines meeting participants' schedules and sentiment data to adjust the date for the next meeting. This automatically sets a schedule that reduces the burden on participants.

[0761] Step 9:

[0762] The server automatically sends invitations to participants based on the decided meeting schedule. These invitations also include emotionally charged notes and preparation requirements.

[0763] (Example 2)

[0764] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0765] Online meetings present challenges in accurately grasping the content and using it for future meetings, as a large amount of information is exchanged simultaneously. Furthermore, it's crucial to appropriately capture participants' emotions to enhance the effectiveness of future meetings. Addressing these challenges requires automated analysis of meeting content and the generation of meeting minutes and agendas that reflect participants' emotional states.

[0766] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0767] In this invention, the server includes means for detecting the end of an online meeting, means for capturing recording data of the meeting and converting audio to text, means for identifying the user's emotional state using an emotion engine, means for analyzing text data and emotion data to generate meeting minutes, and means for automatically generating an agenda and materials for the next meeting, taking the emotion data into consideration. This enables more efficient post-meeting administrative work and more effective meeting management based on the participants' emotions.

[0768] An "online meeting" is a gathering in which participants in multiple locations share audio and video in real time via the internet to communicate.

[0769] A "means for detecting termination" refers to a mechanism that identifies the termination status of an online meeting and notifies the system of that information.

[0770] "Recorded data" refers to digital data that records the audio and video exchanges that took place during an online meeting.

[0771] "Means of converting speech to text" refers to a technology or process for converting recorded speech data into textual information.

[0772] An "emotion engine" is an algorithm or software that analyzes a user's pronunciation and facial expressions to identify their emotional state.

[0773] "Text data" refers to a data format that contains string information obtained by converting speech into text.

[0774] "Methods for generating meeting minutes" refers to a function that automatically creates notes summarizing the meeting content, extracting the important points.

[0775] "Emotional data" refers to digital data that indicates the emotional state of meeting participants, derived from their statements and facial expressions.

[0776] An "agenda" is a plan that lists the items and topics that are scheduled to be discussed at the next meeting.

[0777] "Means for automatically generating documents" refers to technologies or systems that automatically create documents summarizing important information and proposals based on the content of discussions.

[0778] A "schedule" refers to the availability of meeting participants and is used to coordinate the date of the next meeting.

[0779] This invention aims to improve the efficiency and quality of meetings by using an online meeting management system that incorporates an emotion engine. The system is primarily operated through the cooperation of a server, terminals, and users.

[0780] First, the device uses an online meeting application (e.g., a common online meeting platform) to monitor the start and end of the meeting. Once the meeting ends, it sends that information to the server.

[0781] Next, the server starts operating triggered by the termination notification. The server retrieves the audio data of the recorded meeting and uses speech recognition technology (e.g., a common speech-to-text service) to convert the audio data into text data. In addition, the server uses an emotion engine to analyze the user's statements and facial expressions during the meeting to identify the user's emotional state. A common facial expression recognition service is used in this analysis process.

[0782] Text and sentiment data are integrated by the server and analyzed by a meeting minutes generation engine. This engine uses natural language processing techniques to extract key points and areas of significant emotional shifts during the meeting. This automatically generates meeting minutes that provide useful information for participants.

[0783] Furthermore, the server automatically generates the agenda and materials for the next meeting. The server considers emotional data and suggests topics to ensure participants can engage in dialogue without feeling stressed. For example, if a topic is identified with a high level of negative emotion, it will add suggestions for improvement and follow-up to the agenda.

[0784] For example, if the emotion engine detects an increase in negative emotions during a meeting based on user speech data and facial expression data, the server will use this information to generate suggestions for improving the agenda for the next meeting. An example of a prompt for this process would be, "When participants' negative emotions increased during the meeting, please generate a proposal for the next meeting agenda that takes this into account."

[0785] Therefore, by utilizing emotional data, this system can streamline post-meeting processing for online meetings and provide a meeting environment that promotes smooth communication among participants.

[0786] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0787] Step 1:

[0788] The device monitors the start and end of online meetings in real time. When a meeting ends, the device generates termination notification data, including the meeting ID, end time, and a list of participants, and sends it to the server.

[0789] Step 2:

[0790] The server, triggered by the received termination notification, retrieves the corresponding meeting recording data from the meeting database. The server then uses speech recognition technology to convert this audio data into text data. Specifically, the server processes the audio input using a natural language processing algorithm to generate a text-based conversation log.

[0791] Step 3:

[0792] User speech and facial expression data are captured during the meeting via dedicated sensors and cameras and sent to a server. The server uses an emotion engine to analyze this data and identify each user's emotional state. In this analysis process, the content of speech and facial expressions are used as input, and positive, negative, or neutral emotion data is generated as output.

[0793] Step 4:

[0794] The server integrates text data and sentiment data and inputs it into a meeting minutes generation engine. This engine applies an algorithm that extracts important discussion points and areas of significant emotional shifts. The resulting meeting minutes clearly indicate key phrases and emotional fluctuations during the meeting.

[0795] Step 5:

[0796] The server automatically generates the agenda and materials for the next meeting based on the generated sentiment data. For agenda items where sentiment changed significantly, improvement measures and follow-ups are added to the agenda. The previous sentiment data and meeting minutes are used as input, and the proposed agenda is generated as output.

[0797] Step 6:

[0798] The server distributes the generated meeting minutes and agenda to each participant. The distribution process sends the documents to designated email addresses or individual folders in cloud storage. This allows all participants to prepare for the next meeting.

[0799] (Application Example 2)

[0800] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0801] Online communication presents challenges in providing immediate feedback and customized content based on participants' emotions and states. Furthermore, mechanisms for effectively and efficiently recording meeting content and incorporating it into future meetings are often inadequate. To address these challenges, real-time sentiment analysis and content customization are essential.

[0802] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0803] In this invention, the server includes means for taking in recorded data and converting audio information into text information, means for analyzing the emotional state of participants and providing customized content, and means for automatically scheduling the next communication based on the participants' schedules. This enables immediate content tagging based on the emotional state of participants and the automation of optimal planning for the next meeting.

[0804] "Online communication" is a technology that allows multiple participants in remote locations to exchange information and conduct conversations and meetings in real time via the internet.

[0805] "Recorded data" refers to data that stores information such as audio, video, and text generated during online communication.

[0806] "Audio information" refers to the words and sounds uttered by participants during online communication, and is the information that is recorded and analyzed.

[0807] "Textual information" refers to data obtained by converting audio information into text format, and it forms the basis of meeting minutes and reports.

[0808] "Participant emotional state" refers to the results of measuring participants' psychological responses in real time during online communication and analyzing the changes in those responses.

[0809] "Customized content" refers to information and suggestions that are individually optimized based on the participant's emotional state and past data.

[0810] "Content tagging" is a technique that assigns identifiers to specific information or scenes at the peak of participants' emotional states, making them useful for later reference and analysis.

[0811] "Meeting content" refers to the overall information, including statements, discussions, and resolutions exchanged during online communication.

[0812] "Automatic scheduling" refers to a function where the system automatically determines the dates for meetings and communications, taking into account the participants' schedules.

[0813] "The optimal plan for the next meeting" refers to the agenda and materials for the next meeting, which are formulated based on participant sentiment analysis and the content of the previous meeting, with the aim of reducing stress and improving efficiency.

[0814] The embodiment of this invention relates to a content delivery system that utilizes emotion analysis during online communication. Specifically, it enables the provision of individually optimized information by analyzing audio and video data in real time and understanding the emotional state of participants. This system primarily operates through cooperation between a server, a terminal, and a user.

[0815] The server receives recorded data of online communications and uses speech recognition technology to convert speech information into text information. It is recommended to use advanced models such as "OpenAI Whisper" for speech recognition. This allows the communication content to be saved as text data and analyzed.

[0816] Next, the device uses video data to analyze the participants' emotional state in real time. The emotion analysis incorporates "Emotion AI" technology to identify changes in emotion from the video data. This resulting emotional data is then sent to a server and stored as analysis results.

[0817] Users receive customized information provided by the system. This is automatically generated content based on participants' past reactions and key points of discussion. For example, the moments that evoked the most emotional responses from users are tagged, and similar information is recommended later.

[0818] One concrete application of this system is to detect a user's smile while watching an emotionally moving movie and identify the scene in question. Based on the user's preferences, related movies and entertainment are then recommended. An example of a prompt message to input into the generating AI model would be, "Analyze the user's reactions and identify the scene with the strongest response. We would like to use this for future content recommendations."

[0819] In this way, this invention aims to improve the communication experience by utilizing the emotional state of participants in online communications and providing more personalized information.

[0820] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0821] Step 1:

[0822] The device uses a microphone and camera to capture participant audio and video data during online communication. Input is real-time audio and video, while output is digital data for recording. This data is immediately transferred to the server. Specifically, audio is saved in .wav format and video in .mp4 format.

[0823] Step 2:

[0824] The server inputs the received audio data into the speech recognition model "OpenAI Whisper" and converts it into text information. The input is audio data in a .wav file, and the output is the converted text data. Preprocessing such as noise reduction and speaker separation is performed at this stage, which improves the accuracy of the analysis.

[0825] Step 3:

[0826] The device inputs received video data into "Emotion AI," which analyzes the emotional state of participants based on their facial expressions. The input is video data in .mp4 format, and the output is numerical data indicating the emotional category (e.g., joy, interest, surprise) and its intensity. This data is used to track emotional changes in real time.

[0827] Step 4:

[0828] The server integrates the audio text data and sentiment analysis results to perform content tagging. The input is the text data and sentiment data up to this step, and the output is a list of tagged key points. Tagging applies an algorithm that calculates the frequency of verbs and nouns and identifies the moments when emotions peaked.

[0829] Step 5:

[0830] The user receives content generated by the server and recommendations for the next session. The input is tagged points and a list of content recommendations sent from the server, and the output is customized feedback that the user views. For example, the user might receive a customized list that includes trailers for relevant movies.

[0831] Step 6:

[0832] The server uses user feedback and viewing history to input prompts into a generating AI model to optimize future recommendations. The input consists of user feedback data and pre-prepared prompts, while the output is updated information for the next recommendation items. An example of a prompt is, "Analyze the user's reactions and identify the scenes that elicited the strongest responses. We would like to use this information for future content recommendations."

[0833] Through this process, the system can provide personalized information based on participants' emotions, thereby improving the online communication experience.

[0834] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0835] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0836] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0837] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0838] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0839] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0840] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0841] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0842] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0843] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0844] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0845] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0846] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0847] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0848] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0849] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0850] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0851] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0852] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0853] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0854] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0855] The following is further disclosed regarding the embodiments described above.

[0856] (Claim 1)

[0857] A means of detecting the end of an online meeting,

[0858] A method for importing recordings of meetings and converting audio to text,

[0859] A generation method that analyzes text data to generate meeting minutes,

[0860] A means of automatically generating the agenda and materials for the next meeting,

[0861] A method for automatically setting the date of the next meeting based on the schedules of meeting participants,

[0862] A system that includes means for distributing generated meeting minutes and materials to participants.

[0863] (Claim 2)

[0864] The system according to claim 1, comprising means for detecting important moments during an online meeting and taking screenshots.

[0865] (Claim 3)

[0866] The system according to claim 1, comprising means for providing an editing interface for generated meeting minutes and materials for the next meeting.

[0867] "Example 1"

[0868] (Claim 1)

[0869] A means of detecting the end of an online meeting,

[0870] A means of capturing meeting recordings and converting audio into text data,

[0871] A generation method that analyzes text data to generate meeting records,

[0872] A means of automatically generating the plan and materials for the next meeting,

[0873] A method for automatically scheduling the next meeting based on the time management information of meeting participants,

[0874] A means of distributing the generated meeting minutes and materials to participants,

[0875] A means of providing an editing interface for participants to make corrections to the generated records and materials,

[0876] A system that includes means for analyzing audio data to extract important conversations and decisions.

[0877] (Claim 2)

[0878] The system according to claim 1, comprising means for detecting important moments during an online meeting and acquiring visual information.

[0879] (Claim 3)

[0880] The system according to claim 1, comprising means for providing an editing interface for generated meeting records and materials for the next meeting.

[0881] "Application Example 1"

[0882] (Claim 1)

[0883] A means of detecting the end of online information exchange,

[0884] A means of capturing recorded information data and converting sound into text,

[0885] A generation method that analyzes text data to generate a meeting summary,

[0886] A means for automatically generating plans and materials for the next information exchange,

[0887] A means to automatically set the date for the next information exchange based on the participants' plans,

[0888] A means of distributing the generated meeting summary and materials to participants,

[0889] A method for receiving meeting recordings and automatically saving them to cloud storage,

[0890] A method for converting speech to text using a speech recognition API,

[0891] A method for extracting important information using a natural language processing API,

[0892] Methods for creating meeting summaries and next meeting plans using generative AI,

[0893] A system that includes a means of automatically sending notifications to participants.

[0894] (Claim 2)

[0895] The system according to claim 1, comprising means for detecting important moments during online information exchange and acquiring visual information.

[0896] (Claim 3)

[0897] The system according to claim 1, comprising means for providing an editing interface for the generated meeting summary and materials for future information exchange.

[0898] "Example 2 of combining an emotion engine"

[0899] (Claim 1)

[0900] A means of detecting the end of an online meeting,

[0901] A method for importing recordings of meetings and converting audio to text,

[0902] A means of identifying a user's emotional state using an emotion engine,

[0903] A method for generating meeting minutes by analyzing text data and sentiment data,

[0904] A means of automatically generating the agenda and materials for the next meeting, taking emotional data into consideration,

[0905] A method for automatically setting the date of the next meeting based on the schedules of meeting participants,

[0906] A system that includes means for distributing generated meeting minutes and materials to participants.

[0907] (Claim 2)

[0908] The system according to claim 1, comprising means for detecting important moments during an online meeting and identifying changes in the user's emotions.

[0909] (Claim 3)

[0910] The system according to claim 1, comprising means for providing an editing interface that reflects sentiment data in the generated meeting minutes and materials for the next meeting.

[0911] "Application example 2 of combining emotional engines"

[0912] (Claim 1)

[0913] A means of detecting the termination of online communication,

[0914] A means for capturing recorded data and converting audio information into text information,

[0915] A generation method that analyzes text information to generate meeting minutes,

[0916] A means for automatically generating the next communication plan and materials,

[0917] A method for automatically scheduling the next communication based on the participants' schedules,

[0918] A means of distributing the generated meeting minutes and materials to participants,

[0919] A means of analyzing the emotional state of participants and providing customized content.

[0920] A system that includes this.

[0921] (Claim 2)

[0922] The system according to claim 1, comprising means for detecting emotional peaks during communication and assigning customized content tags.

[0923] (Claim 3)

[0924] The system according to claim 1, comprising means for providing an editing interface for generated meeting minutes and materials for future communications. [Explanation of Symbols]

[0925] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of detecting the end of an online meeting, A method for importing recordings of meetings and converting audio to text, A generation method that analyzes text data to generate meeting minutes, A means of automatically generating the agenda and materials for the next meeting, A method for automatically setting the date of the next meeting based on the schedules of meeting participants, A system that includes means for distributing generated meeting minutes and materials to participants.

2. The system according to claim 1, comprising means for detecting important moments during an online meeting and taking screenshots.

3. The system according to claim 1, comprising means for providing an editing interface for generated meeting minutes and materials for the next meeting.