system

A system automates post-meeting tasks by transcribing and summarizing online meeting data, allowing editing and scheduling, thereby reducing staff burden and improving meeting efficiency.

JP2026105374APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-16
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

The administrative tasks following online meetings, such as creating meeting minutes, setting agendas, and distributing information, are cumbersome and burdensome for busy staff, requiring a more efficient and automated solution.

Method used

A system that automatically acquires and transcribes audio and video data from online meetings, generates meeting minutes, allows editing, schedules the next meeting, and distributes information to participants, utilizing speech recognition, image processing, and generative AI to streamline post-meeting tasks.

Benefits of technology

Reduces the burden on staff by automating administrative tasks and ensuring rapid and accurate provision of meeting information, enhancing meeting efficiency and reducing manual effort.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026105374000001_ABST
    Figure 2026105374000001_ABST
Patent Text Reader

Abstract

Provide a system. 【Solution means】 Means for acquiring assembly data via a communication network, Means for converting the acquired assembly audio data into character information, Means for generating summary information from the character information and assembly video data, Means for presenting the generated summary information to an operator and accepting modifications, Means for automatically setting the topic of the next assembly, Means for automatically transmitting assembly information to members, Means for displaying summary information and the next operation procedure using an operating device and providing work support, A system including the above.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] With the increase in online meetings, the administrative tasks such as creating meeting minutes after the meeting, setting the agenda for the next meeting, and distributing information to participants are cumbersome and a heavy burden on busy staff. Therefore, it is required to streamline these operations and reduce the burden on the secretariat.

Means for Solving the Problems

[0005] The inventor provides a system that automatically acquires data from online meetings via a communication network and transcribes the audio data. This system utilizes the transcribed data and meeting image data to generate meeting minutes, presents them to the user for editing, automatically schedules the next meeting, and automatically distributes the generated meeting information to participants. In this way, the system efficiently automates administrative tasks after online meetings and reduces the burden on the person in charge.

[0006] A "communication network" is an infrastructure for exchanging data between computer systems located in different locations.

[0007] "Meeting data" refers to digital data, including audio, video, and other related information, generated during an online meeting.

[0008] "Transcription" is the process of converting audio data into text data.

[0009] "Meeting minutes" refers to textual information that summarizes the content of a meeting, and typically serves as a record of the meeting.

[0010] A "user" is an individual or organization that uses this system to manage meeting information.

[0011] "To accept edits" means to provide users with the ability to modify or revise the information they have been presented.

[0012] "Next meeting schedule" refers to schedule information that includes details such as the date, time, location, and participants of a future meeting.

[0013] An "external calendar service" is an online platform designed to assist users with managing their schedules, allowing them to manage their appointments within that platform.

[0014] "Automated delivery" is the process by which a system delivers information to its recipient without human intervention. [Brief explanation of the drawing]

[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine.

Mode for Carrying Out the Invention

[0016] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0017] First, the terms used in the following description will be described.

[0018] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of a plurality of arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of a plurality of types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0019] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0020] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0023] [First Embodiment]

[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0036] The system of this invention is designed to automate post-meeting processing for online meetings and to efficiently provide information to meeting participants. This system mainly consists of three elements: a server, a terminal, and a user, each of which plays a specific role.

[0037] The server acquires audio and video data of meetings through an interface with the online meeting platform. This data is transcribed using a speech recognition service, and screenshots of the meeting are extracted using image processing. The server inputs this information into a generation AI to create meeting minutes summarizing the key points of the meeting. The system also automatically generates draft materials and agendas for the next meeting.

[0038] The terminal displays meeting minutes and the next agenda generated on the server to the user. The user can use an interface on the terminal to view and edit this information. Once editing is complete, the changes are sent back to the server.

[0039] Users can review the meeting minutes generated via their devices and make corrections as needed. After the user has finalized the information, the system integrates with an external calendar service to automatically schedule the next meeting. This updates the meeting participants' calendars and provides necessary notifications.

[0040] As a concrete example, the server uses a specific API to send audio data to a transcription service, which then generates meeting minutes from the resulting text. These minutes are displayed in the user's browser, allowing them to review and make any necessary corrections. The approved meeting minutes are then automatically distributed to all participants via the server.

[0041] As described above, the system of the present invention efficiently handles post-processing of online meetings, reduces the burden on users, and enables the rapid and accurate provision of information.

[0042] The following describes the processing flow.

[0043] Step 1:

[0044] The server automatically collects audio and video data from online meeting platforms via their APIs. The collected data is stored internally and subjected to necessary preprocessing.

[0045] Step 2:

[0046] The server sends the audio data to a speech recognition service (e.g., a speech recognition API), which transcribes the audio and generates text data. This text data is then stored as meeting transcript information.

[0047] Step 3:

[0048] The server extracts screenshots from the meeting video data at specific time intervals and saves them as image data. These images are used as visual elements of the meeting content.

[0049] Step 4:

[0050] The server inputs the transcript data and image data into a generation AI to generate meeting minutes that summarize the key points of the meeting. These minutes may include extracted screenshots.

[0051] Step 5:

[0052] The terminal displays the generated meeting minutes and the agenda for the next meeting to the user. The user can use the interface on the terminal to review the content and edit or modify it as needed.

[0053] Step 6:

[0054] The user reviews the content and gives final approval via their device. The approved information is returned to the server and ready for distribution.

[0055] Step 7:

[0056] The server integrates with an external calendar service to automatically schedule the next meeting. The scheduled date is then reflected in the participants' calendars.

[0057] Step 8:

[0058] The server automatically distributes the completed meeting minutes and the agenda for the next meeting to participants via email or messaging service. This distribution allows all stakeholders to quickly share the meeting content and the date of the next meeting.

[0059] (Example 1)

[0060] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0061] Currently, creating meeting minutes after online meetings is time-consuming and laborious, hindering meeting efficiency. Furthermore, scheduling future meetings and sharing information with participants are often done manually, placing a significant burden on users. Solving these problems and achieving a smoother online meeting process is essential.

[0062] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0063] In this invention, the server includes means for acquiring meeting information via a communication network, means for converting the acquired meeting audio data into text data using speech recognition technology, and means for extracting screenshots from meeting video data using image analysis technology. This makes it possible to streamline the post-processing of online meetings and reduce the burden on users.

[0064] A "communication network" is a system that connects information devices such as computers located in different places to send and receive data.

[0065] "Meeting information" refers to all audio, video, and related data used in online meetings.

[0066] "Speech recognition technology" is a technology for converting speech data into text data.

[0067] "Character data" refers to text-based information that represents audio information converted using speech recognition technology.

[0068] "Image analysis technology" is the technology that analyzes image data to give it meaning.

[0069] A "screenshot" refers to an image saved from the screen used during a meeting.

[0070] "Generative AI" is a technology that uses artificial intelligence to generate new information from data, and in this context, it is used to generate meeting minutes.

[0071] "Summary information" refers to information that simplifies the original information and summarizes only the main points.

[0072] A "terminal" refers to a device that a user uses to view or edit information.

[0073] An "external schedule management service" refers to an external service used for managing schedules, such as a calendar or schedule management tool.

[0074] A "notification function" is a mechanism for conveying specific information to a user.

[0075] This system primarily consists of three elements: servers, terminals, and users. Each element plays a specific role, streamlining the post-processing of online meetings.

[0076] The server acquires meeting information from online meeting platforms via a communication network. During this process, it obtains audio and video data using APIs such as Zoom and Microsoft Teams. The server converts the acquired audio data into text data using speech recognition technologies such as Google Cloud Speech-to-Text and IBM Watson. Furthermore, it extracts screenshots from the video data using image analysis technologies such as OpenCV and ffmpeg. This makes it possible to capture important parts of slides presented during the meeting.

[0077] The server uses a generative AI model to generate summary information based on acquired text data and screenshots. A large-scale language model such as GPT-3 (registered trademark) is suitable for use here. An example of a prompt message could be, "Based on the following text and image information, summarize the three main points of the meeting."

[0078] The terminal presents the user with summary information sent from the server. Specifically, it allows the user to view the summary information through a browser-based interface using HTML / CSS. The user edits the summary information using the provided interface and sends the results back to the server.

[0079] The summary information, after being reviewed and edited by the user, is used by the server to schedule the next meeting. The server integrates with external scheduling services (e.g., Google Calendar, Outlook) via API to automatically set the next meeting date. This allows meeting participants to receive schedule-based notifications, enabling efficient time management.

[0080] By implementing this system, users can significantly reduce the time and effort spent on post-meeting processing for online meetings, enabling efficient and accurate information sharing.

[0081] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0082] Step 1:

[0083] The server retrieves audio and video data from the online meeting platform via the communication network. It uses the specified API as input to request data about completed meetings. As output, the meeting's audio and video files are stored on the server.

[0084] Step 2:

[0085] The server converts the acquired audio data into text data using speech recognition technology such as Google Cloud Speech-to-Text. It sends the audio file as input to the speech recognition engine for processing. The output is the meeting content converted into text format.

[0086] Step 3:

[0087] The server extracts screenshots from video data to capture important moments. As input, it processes video files using image analysis techniques such as OpenCV. As output, screenshot images highlighting important slides and visuals are generated.

[0088] Step 4:

[0089] The server uses a generative AI model to generate summary information from text data and screenshots as input. It generates prompts, such as "Based on the following text and image information, summarize the three main points of the meeting." The output is a summarized meeting minutes document.

[0090] Step 5:

[0091] The terminal displays information on the user interface using summary information received from the server. It receives summary information from the server as input and converts it into a format viewable on a browser screen as output.

[0092] Step 6:

[0093] The user reviews the summary information presented on the terminal and edits it if necessary. They use the text editing function on the terminal as input and save the revised meeting minutes as output.

[0094] Step 7:

[0095] The server automatically schedules the next meeting in conjunction with an external scheduling service, based on the meeting minutes information reviewed and corrected by the user. The corrected meeting minutes and the requirements for the next meeting are used as input, and the newly scheduled meeting date is added to the external calendar as output.

[0096] (Application Example 1)

[0097] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0098] Modern organizational activities, particularly operational meetings in logistics centers, are frequent, and their efficiency and accurate information sharing are crucial. However, these meetings often require significant time and effort for recording content and organizing information, resulting in ineffective planning and clarification of procedures for future work. Therefore, there is a need for a system that automates efficient post-meeting processing and the rapid and accurate transmission of necessary information, thereby improving the accuracy and efficiency of operations.

[0099] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0100] In this invention, the server includes means for acquiring meeting data via a communication network, means for converting the acquired meeting audio data into text information, means for generating summary information from the text information and meeting video data, and means for displaying the summary information and the next work procedure using an operating device to provide work support. This enables efficient post-meeting processing and clarifies the next work procedure, thereby improving the efficiency and accuracy of operations at the logistics center.

[0101] A "communication network" is an information exchange system for sending and receiving digital data remotely.

[0102] "Meeting data" refers to a collection of information, such as audio and video, generated in connection with meetings and discussions.

[0103] "Means of converting into text information" refers to a function that analyzes audio data and converts the heard content into text-based text.

[0104] "Summary information" refers to concise information that summarizes the main topics and points covered at a meeting.

[0105] An "operator" refers to a person who uses a system to verify or manipulate information.

[0106] An "operating device" is a device used by users to display information and operate a system.

[0107] "Visual information" refers to digital data that can be visually recognized, such as screenshots and images.

[0108] An "external scheduling service" is an external platform for managing appointments and schedules electronically.

[0109] This invention provides a system that effectively utilizes meeting data collected via a communication network and performs efficient post-processing.

[0110] The server uses a communication network to acquire meeting data generated during conferences and meetings. The acquired audio data is converted into text information using a speech recognition API (e.g., Google Speech-to-Text API). Furthermore, video data is used to generate summary information using an image processing library (e.g., OpenCV). This summary information is meticulously constructed using a generative AI (e.g., OpenAI® GPT-3) to concisely summarize the key points.

[0111] The terminal displays summary information and next-task procedures received from the server by the operator. Through the interface on the terminal, the operator can review and edit the information, which helps in planning future tasks.

[0112] Users can use their devices to review the presented summary information and, if necessary, utilize external scheduling services to coordinate the agenda and date of the next meeting.

[0113] As a concrete example, in daily shipping meetings at a logistics center, a server efficiently processes audio and video to generate summary information in a short time. This helps operators quickly access past meeting information and clarify the next steps in their work.

[0114] An example of an input prompt for the generation AI model is, "Please generate minutes for this online meeting. The full text of the meeting is below." By then entering the specific text of the meeting, the model will automatically generate the meeting's key points and the agenda for the next meeting.

[0115] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0116] Step 1:

[0117] The server acquires meeting data (audio and video) via the communication network. The acquired data is stored in an appropriate format, as it forms the basis for subsequent processing.

[0118] Step 2:

[0119] The server sends the acquired audio data to a speech recognition API (e.g., Google Speech-to-Text API) to convert it into text. This process outputs the audio data as text information, which is then used to record the meeting content.

[0120] Step 3:

[0121] The server uses an image processing library (e.g., OpenCV) on the video data to generate necessary screen captures, if required. This extracts important visual information from the meeting, which is useful for creating summaries later.

[0122] Step 4:

[0123] Based on text data and video information, the server uses a generative AI (e.g., OpenAI GPT-3) to generate summary information. Text and visual information are provided as input, and the AI ​​analyzes and processes them to create a summary and suggestions for future tasks.

[0124] Step 5:

[0125] The terminal displays summary information received from the server to the operator. Through the terminal's interface, the operator can review and edit the meeting summary and proposals for the next meeting.

[0126] Step 6:

[0127] Users can make final confirmations of the information presented on their device and make corrections as needed. This editable interface allows users to ensure the accuracy of the information and improve overall work efficiency.

[0128] Step 7:

[0129] The user sends the modified information from the terminal to the server, which then connects to an external scheduling service. By integrating the final meeting information with the scheduling service, the schedule for the next meeting is automatically updated.

[0130] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0131] This invention is a system for further enhancing the post-processing of online meetings and automating the provision of information that takes participants' emotions into account. This system comprises server, terminal, and user elements, and by combining them with an emotion recognition engine, it identifies the emotions of participants during a meeting and reflects the results in various functions.

[0132] The server collects audio, video, and related data from the online meeting platform. The acquired audio data is transcribed by a speech recognition service, and simultaneously, an emotion recognition engine analyzes participants' voice tones and facial expressions to determine their emotions. This generates not only textual information but also data indicating the emotional state of the participants.

[0133] The terminal presents the user with meeting minutes generated on the server, along with sentiment analysis results. Users can use an interface that allows them to visually see how emotional changes influenced the meeting. For example, they can easily see, using graphs and tags, what emotions participants displayed at important points in the discussion.

[0134] Users can view meeting minutes and sentiment analysis results on their devices and provide feedback or make corrections as needed. After user approval, the system adjusts the agenda for the next meeting to reflect the changes in sentiment. This agenda adjustment is expected to make the next meeting more efficient and effective.

[0135] For example, the server can capture participants' laughter and high-pitched voices, detecting positive emotions at specific points in the meeting. This allows users to further refine the agenda for the next meeting based on the positive feedback. Conversely, if negative reactions are detected, the server can analyze their causes and use them as clues to improve future discussions.

[0136] In this way, the present invention not only automates administrative tasks but also functions as a tool to improve the quality of meeting content. By understanding participants' emotions in real time and reflecting them in subsequent actions, it becomes possible to create new value in online communication.

[0137] The following describes the processing flow.

[0138] Step 1:

[0139] The server retrieves audio and video data from the online meeting platform. The retrieved data is stored internally and prepared to be sent to the speech recognition engine (audio data) and the emotion recognition engine (video data).

[0140] Step 2:

[0141] The server processes the audio data through a speech recognition engine to transcribe it and generate text data. Simultaneously, an emotion recognition engine analyzes the emotions from the audio, generating emotional data of the participants' speech. For example, it analyzes changes in voice tone and pitch to identify emotions such as joy or anger.

[0142] Step 3:

[0143] The server uses video data to allow an emotion recognition engine to analyze the participants' facial expressions. The output here is emotion data extracted from the visual image, where the type of emotion (e.g., tension, relaxation, excitement) is determined from subtle facial movements and expressions.

[0144] Step 4:

[0145] The server aggregates transcript data, sentiment data extracted from audio and video, and generates comprehensive meeting minutes. These minutes include sentiment information detected within each statement and discussion, and are linked to specific topics.

[0146] Step 5:

[0147] The terminal displays meeting minutes received from the server to the user. Through the interface, the user can visualize and see how emotions changed during the meeting. For example, they can view a graph showing the shift in emotions during important discussions.

[0148] Step 6:

[0149] Based on the presented meeting minutes and sentiment data, users review the minutes and add corrections or comments as needed. These corrections are then sent back to the server.

[0150] Step 7:

[0151] Based on the finalized and approved meeting minutes, the server adjusts the agenda for the next meeting based on sentiment analysis. For example, it allocates more time to topics that received positive feedback in the previous meeting, and sets follow-up actions for topics that received negative feedback.

[0152] Step 8:

[0153] The server automatically distributes the completed meeting minutes and the adjusted agenda for the next meeting to participants via email or messaging services. This distribution ensures that relevant information is shared quickly and effectively among stakeholders.

[0154] (Example 2)

[0155] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0156] Traditional online meeting systems lacked automated processes for creating meeting minutes and sentiment analysis, making it difficult to accurately reflect participants' emotional shifts and the depth of discussions. Furthermore, the inability to efficiently plan and adjust future meetings based on feedback could potentially lead to a decline in meeting quality.

[0157] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0158] In this invention, the server includes means for acquiring meeting information via a communication circuit, means for converting the acquired meeting audio information into text information, and means for generating meeting minutes data from the converted text information and meeting video information. This makes it possible to grasp the emotional changes of participants during a meeting in real time and to automatically adjust the plan for the next meeting based on the results of the emotional analysis.

[0159] A "communication circuit" is a means of transmitting and receiving data to and from a remote location, and is a technology used to exchange information over a network.

[0160] "Meeting information" refers to information including audio, video, and related data collected during an online meeting.

[0161] "Means of converting to textual information" refers to technologies that process audio information to visually represent it as text.

[0162] "Meeting minutes" are records compiled from text and video information of online meetings, and are used to show the progress and content of the meeting.

[0163] "Emotional analysis results" refer to data indicating the emotional state detected from participants' voices and videos, representing emotional changes during the meeting as numerical values ​​and graphs.

[0164] A "means for automatically adjusting the plan for the next meeting" is a system that optimizes the agenda and proceedings of the next meeting based on past meeting information and feedback.

[0165] "Visual representation" is a technique that uses visual elements such as graphs and icons to express emotional changes and information in a way that allows users to intuitively understand them.

[0166] An "external schedule management service" is an external system or application that manages calendars and schedules, and is used to efficiently manage meeting schedules.

[0167] This invention is a system that improves the efficiency of online meetings and automates the delivery of information while taking participants' emotions into consideration. This system consists of server, terminal, and user elements.

[0168] The server acquires audio, video, and related data from the online meeting platform in real time. Audio from the meeting information is converted into text data using a speech recognition service (e.g., an online speech recognition API). Simultaneously, an emotion recognition engine (e.g., an emotion analysis API) is used to analyze participants' emotions in real time based on their voice tone and facial expressions.

[0169] The terminal integrates sentiment analysis results with meeting minutes data processed on the server and presents them visually to the user. This allows users to easily see changes in emotions during the meeting through tagged graphs and icons. For example, positive reactions from participants during heated discussions are visually displayed.

[0170] The user reviews the meeting minutes and sentiment analysis results presented on their device. Based on these results, the user can provide feedback. After user approval, the system readjusts the agenda for the next meeting. This adjustment process utilizes a generative AI model, which presents the user with suggestions using optimal prompts.

[0171] For example, the server can capture laughter and tone of voice during a meeting, detect positive emotions at specific points in the discussion, and further enhance the positive summary in the next meeting. Conversely, in instances of negative reactions, the server can analyze the causes and use that information to improve the next meeting.

[0172] Example of a prompt:

[0173] "How can I understand participants' emotions in real time during an online meeting and adjust the agenda for the next meeting based on those emotional changes?"

[0174] This system functions as a tool to improve the quality of meetings and provides new value to online communication by effectively incorporating real-time emotional data into future actions.

[0175] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0176] Step 1:

[0177] The server retrieves meeting information from the online meeting platform via a communication circuit. This information includes audio, video, and associated metadata. The server receives this data as a stream and prepares it for the next processing step.

[0178] Step 2:

[0179] The server converts the acquired meeting audio data into text information. Specifically, it uses a speech recognition API to convert the audio stream into transcribed data. The input is audio data, and the output is text data. This process saves the meeting content as text, making it easy to search and analyze.

[0180] Step 3:

[0181] The server uses an emotion recognition engine, taking the acquired video data and converted text data as input. It analyzes voice tone and participants' facial expressions to identify their emotional state. The output of this analysis is metadata indicating the participants' emotional changes. Specifically, emotional ups and downs are tagged.

[0182] Step 4:

[0183] The server integrates text data and sentiment analysis results to generate meeting minutes. This generated data combines the main points of the discussion with changes in sentiment. It takes text data and sentiment metadata as input and produces complete meeting minutes as output.

[0184] Step 5:

[0185] The terminal receives meeting minutes data sent from the server and presents it to the user. It displays changes in emotions and the content of the discussion in a visual format. For example, it may include graphs or icons showing emotional peaks during specific discussions. The input is integrated meeting minutes data, and the output is visual information provided to the user.

[0186] Step 6:

[0187] Users can review meeting minutes and provide feedback through an interface on their devices. By commenting on the content of the minutes and the results of sentiment analysis, users can incorporate their feedback into the agenda for the next meeting. The input is user feedback, and the output is revised minutes and adjusted meeting plans.

[0188] Step 7:

[0189] The server takes user feedback into consideration and automatically adjusts the plan for the next meeting using a generative AI model. This process takes user feedback and past meeting data as input and generates the agenda for the next meeting as output. A generated prompt may also be presented as a suggestion.

[0190] (Application Example 2)

[0191] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0192] In online meetings, there is a need to analyze meeting content while considering participants' emotions and to improve the accuracy of future meeting planning. However, existing systems make it difficult to provide feedback and adjust agendas that appropriately reflect participants' emotional states. In particular, improving the quality of online learning and business meetings requires understanding changes in participants' emotions in real time and reflecting them in the meeting proceedings, but there is a lack of effective means to do so.

[0193] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0194] In this invention, the server includes means for acquiring meeting data via a communication network, means for converting the acquired meeting audio data into text, and means for analyzing participants' facial expressions and voice tones to generate emotion data. This makes it possible to generate and present meeting minutes that reflect the emotions of the participants.

[0195] A "communication network" is a network infrastructure used to share data among participants in a meeting or conference.

[0196] "Meeting data" refers to digital data including audio, video, and related information generated during online meetings and discussions.

[0197] "Text conversion" is the process of mechanically converting audio data into text data.

[0198] "Meeting information" refers to a document summarizing the progress and content of a meeting, and includes both text and image data.

[0199] "User" refers to anyone who uses this system to receive meeting information and sentiment data.

[0200] "Emotional data" refers to data that indicates the emotional state of participants, inferred from changes in their facial expressions and voice tone.

[0201] A "recording screen" refers to an image or screenshot taken during a meeting or gathering, and is used as part of the meeting information.

[0202] An "external calendar service" is a third-party calendar service used to manage meeting dates and appointments.

[0203] This invention is a system for generating and presenting information that takes participants' emotions into account during online meetings. The server acquires meeting data via a communication network and converts audio data into text. Furthermore, it uses an emotion recognition engine to carefully analyze participants' facial expressions and voice tone to generate emotion data. This engine integrates existing facial recognition and voice analysis technologies, enabling high-precision measurement of emotional states in real time. The emotion data clearly indicates states such as positive, negative, and neutral.

[0204] The generated sentiment data and meeting information are presented to the user via a terminal. Users can visually confirm changes in participants' sentiments through graphs and charts, and make adjustments as needed. This information is used when creating the plan for the next meeting, promoting more effective meeting management.

[0205] As a concrete example, the server performs real-time sentiment analysis of students and visually presents the results to the instructor, allowing for the measurement of students' interest and understanding during class. If the instructor determines that a student has questions, they can flexibly adjust the pace of the lesson and explain the content again.

[0206] An example of a prompt for a generative AI model is: "Please provide a model that analyzes the student's emotions during this lesson using three different parameters (interest, question, anxiety) and provides feedback based on that analysis." Such prompts can improve the accuracy of emotion recognition technology.

[0207] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0208] Step 1:

[0209] The server acquires meeting data via the communication network. The input data consists of meeting audio, video, and related metadata. This data is collected on the server, laying the foundation for subsequent analysis.

[0210] Step 2:

[0211] The server inputs the acquired audio data into a text conversion engine and converts it into text data. This process analyzes the audio and generates corresponding strings, which can then be used as meeting minutes. The output is a complete text-based transcript of the meeting.

[0212] Step 3:

[0213] The server uses an emotion recognition engine to analyze participants' facial expressions and voice tone, generating emotion data. Video data and voice tone are used as input. The system analyzes this data and classifies each participant's emotion as positive, negative, or neutral. The output is a dataset showing each participant's emotional state.

[0214] Step 4:

[0215] The terminal presents meeting information, combining generated text minutes and sentiment data, to the user via a graphical interface. This information visually represents changes in emotions over time, allowing users to easily understand the meeting content and their emotional responses. Input consists of generated text data and sentiment data, while output is a display integrating these.

[0216] Step 5:

[0217] Users can review the presented meeting information and add corrections or comments as needed. By providing feedback, users can improve the quality of meetings and incorporate it into future proceedings. The input is user feedback, and the output is the adjusted meeting information.

[0218] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0219] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0220] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0221] [Second Embodiment]

[0222] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0223] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0224] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0225] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0226] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0227] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0228] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0229] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0230] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0231] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0232] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0233] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0234] The system of this invention is designed to automate post-meeting processing for online meetings and to efficiently provide information to meeting participants. This system mainly consists of three elements: a server, a terminal, and a user, each of which plays a specific role.

[0235] The server acquires audio and video data of meetings through an interface with the online meeting platform. This data is transcribed using a speech recognition service, and screenshots of the meeting are extracted using image processing. The server inputs this information into a generation AI to create meeting minutes summarizing the key points of the meeting. The system also automatically generates draft materials and agendas for the next meeting.

[0236] The terminal displays meeting minutes and the next agenda generated on the server to the user. The user can use an interface on the terminal to view and edit this information. Once editing is complete, the changes are sent back to the server.

[0237] Users can review the meeting minutes generated via their devices and make corrections as needed. After the user has finalized the information, the system integrates with an external calendar service to automatically schedule the next meeting. This updates the meeting participants' calendars and provides necessary notifications.

[0238] As a concrete example, the server uses a specific API to send audio data to a transcription service, which then generates meeting minutes from the resulting text. These minutes are displayed in the user's browser, allowing them to review and make any necessary corrections. The approved meeting minutes are then automatically distributed to all participants via the server.

[0239] As described above, the system of the present invention efficiently handles post-processing of online meetings, reduces the burden on users, and enables the rapid and accurate provision of information.

[0240] The following describes the processing flow.

[0241] Step 1:

[0242] The server automatically collects audio and video data from online meeting platforms via their APIs. The collected data is stored internally and subjected to necessary preprocessing.

[0243] Step 2:

[0244] The server sends the audio data to a speech recognition service (e.g., a speech recognition API), which transcribes the audio and generates text data. This text data is then stored as meeting transcript information.

[0245] Step 3:

[0246] The server extracts screenshots from the meeting video data at specific time intervals and saves them as image data. These images are used as visual elements of the meeting content.

[0247] Step 4:

[0248] The server inputs the transcript data and image data into a generation AI to generate meeting minutes that summarize the key points of the meeting. These minutes may include extracted screenshots.

[0249] Step 5:

[0250] The terminal displays the generated meeting minutes and the agenda for the next meeting to the user. The user can use the interface on the terminal to review the content and edit or modify it as needed.

[0251] Step 6:

[0252] The user reviews the content and gives final approval via their device. The approved information is returned to the server and ready for distribution.

[0253] Step 7:

[0254] The server integrates with an external calendar service to automatically schedule the next meeting. The scheduled date is then reflected in the participants' calendars.

[0255] Step 8:

[0256] The server automatically distributes the completed meeting minutes and the agenda for the next meeting to participants via email or messaging service. This distribution allows all stakeholders to quickly share the meeting content and the date of the next meeting.

[0257] (Example 1)

[0258] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0259] Currently, creating meeting minutes after online meetings is time-consuming and laborious, hindering meeting efficiency. Furthermore, scheduling future meetings and sharing information with participants are often done manually, placing a significant burden on users. Solving these problems and achieving a smoother online meeting process is essential.

[0260] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0261] In this invention, the server includes means for acquiring meeting information via a communication network, means for converting the acquired meeting audio data into text data using speech recognition technology, and means for extracting screenshots from meeting video data using image analysis technology. This makes it possible to streamline the post-processing of online meetings and reduce the burden on users.

[0262] A "communication network" is a system that connects information devices such as computers located in different places to send and receive data.

[0263] "Meeting information" refers to all audio, video, and related data used in online meetings.

[0264] "Speech recognition technology" is a technology for converting speech data into text data.

[0265] "Character data" refers to text-based information that represents audio information converted using speech recognition technology.

[0266] "Image analysis technology" is the technology that analyzes image data to give it meaning.

[0267] A "screenshot" refers to an image saved from the screen used during a meeting.

[0268] "Generative AI" is a technology that uses artificial intelligence to generate new information from data, and in this context, it is used to generate meeting minutes.

[0269] "Summary information" refers to information that simplifies the original information and summarizes only the main points.

[0270] A "terminal" refers to a device that a user uses to view or edit information.

[0271] An "external schedule management service" refers to an external service used for managing schedules, such as a calendar or schedule management tool.

[0272] A "notification function" is a mechanism for conveying specific information to a user.

[0273] This system primarily consists of three elements: servers, terminals, and users. Each element plays a specific role, streamlining the post-processing of online meetings.

[0274] The server retrieves meeting information from online meeting platforms via a communication network. During this process, it uses APIs such as Zoom and Microsoft Teams to acquire audio and video data. The server converts the acquired audio data into text data using speech recognition technologies such as Google Cloud Speech-to-Text and IBM Watson. Furthermore, it extracts screenshots from the video data using image analysis technologies such as OpenCV and ffmpeg. This makes it possible to capture important parts of slides presented during the meeting.

[0275] The server utilizes a generative AI model to generate summary information based on acquired text data and screenshots. A large-scale language model such as GPT-3 is suitable for this purpose. An example of a prompt would be, "Based on the following text and image information, summarize the three main points of the meeting."

[0276] The terminal presents the user with summary information sent from the server. Specifically, it allows the user to view the summary information through a browser-based interface using HTML / CSS. The user edits the summary information using the provided interface and sends the results back to the server.

[0277] The summary information, after being reviewed and edited by the user, is used by the server to schedule the next meeting. The server integrates with external scheduling services (e.g., Google Calendar, Outlook) via API to automatically set the next meeting date. This allows meeting participants to receive schedule-based notifications, enabling efficient time management.

[0278] By implementing this system, users can significantly reduce the time and effort spent on post-meeting processing for online meetings, enabling efficient and accurate information sharing.

[0279] The flow of the specific process in Example 1 will be described using FIG. 11.

[0280] Step 1:

[0281] The server acquires audio and video data from an online meeting platform via a communication network. Using the API specified as input, it requests data regarding the completed meeting. As output, the audio file and video file of the meeting are saved on the server.

[0282] Step 2:

[0283] The server converts the acquired audio data into character data using speech recognition technology such as Google Cloud Speech-to-Text. As input, it sends the audio file to the speech recognition engine for processing. As output, the content of the meeting converted into text format is obtained.

[0284] Step 3:

[0285] The server extracts screenshots to capture important moments from the video data. As input, it processes the video file using image analysis technology such as OpenCV. As output, screenshot images where important slides or visuals are the points are generated.

[0286] Step 4:

[0287] The server uses the generated AI model to generate summary information with the character data and screenshots as input. It generates a prompt sentence and sets an instruction sentence such as "Please summarize the three main points of the meeting based on the following text and image information." As output, the summarized minutes information is created.

[0288] Step 5:

[0289] The terminal displays information on the user interface using summary information received from the server. It receives summary information from the server as input and converts it into a format viewable on a browser screen as output.

[0290] Step 6:

[0291] The user reviews the summary information presented on the terminal and edits it if necessary. They use the text editing function on the terminal as input and save the revised meeting minutes as output.

[0292] Step 7:

[0293] The server automatically schedules the next meeting in conjunction with an external scheduling service, based on the meeting minutes information reviewed and corrected by the user. The corrected meeting minutes and the requirements for the next meeting are used as input, and the newly scheduled meeting date is added to the external calendar as output.

[0294] (Application Example 1)

[0295] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0296] Modern organizational activities, particularly operational meetings in logistics centers, are frequent, and their efficiency and accurate information sharing are crucial. However, these meetings often require significant time and effort for recording content and organizing information, resulting in ineffective planning and clarification of procedures for future work. Therefore, there is a need for a system that automates efficient post-meeting processing and the rapid and accurate transmission of necessary information, thereby improving the accuracy and efficiency of operations.

[0297] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0298] In this invention, the server includes means for acquiring meeting data via a communication network, means for converting the acquired meeting audio data into text information, means for generating summary information from the text information and meeting video data, and means for displaying the summary information and the next work procedure using an operating device to provide work support. This enables efficient post-meeting processing and clarifies the next work procedure, thereby improving the efficiency and accuracy of operations at the logistics center.

[0299] A "communication network" is an information exchange system for sending and receiving digital data remotely.

[0300] "Meeting data" refers to a collection of information, such as audio and video, generated in connection with meetings and discussions.

[0301] "Means of converting into text information" refers to a function that analyzes audio data and converts the heard content into text-based text.

[0302] "Summary information" refers to concise information that summarizes the main topics and points covered at a meeting.

[0303] An "operator" refers to a person who uses a system to verify or manipulate information.

[0304] An "operating device" is a device used by users to display information and operate a system.

[0305] "Visual information" refers to digital data that can be visually recognized, such as screenshots and images.

[0306] An "external scheduling service" is an external platform for managing appointments and schedules electronically.

[0307] This invention provides a system that effectively utilizes meeting data collected via a communication network and performs efficient post-processing.

[0308] The server uses a communication network to obtain the meeting data generated in meetings and consultations. The obtained voice data is converted into character information by a voice recognition API (e.g., Google Speech-to-Text API). Furthermore, the video data is used to generate summary information using an image processing library (e.g., OpenCV). This summary information is elaborately constructed using a generative AI (e.g., OpenAI GPT-3) to concisely summarize the important points.

[0309] The terminal displays the summary information and the next operation procedure received by the operator from the server to the user. Through the interface on the terminal, the operator can check and edit the information, which helps to support the planning of the next business.

[0310] The user can check the summary information presented using the terminal and, if necessary, utilize an external schedule management service to adjust the topics and schedule of the next meeting.

[0311] As a specific example, in the daily shipping meetings at a logistics center, the server efficiently processes the voice and video, generating summary information in a short time. This enables the operator to quickly access past meeting information and clarify the next operation procedure.

[0312] As an example of the input prompt to the generative AI model, it can be "Please generate the minutes of an online meeting. The following is the full text of the meeting.", and then by inputting the specific text of the meeting, the key points of the meeting and the next agenda are automatically generated.

[0313] The flow of the specific process in Application Example 1 will be described using FIG. 12.

[0314] Step 1:

[0315] The server acquires meeting data (audio and video) via the communication network. The acquired data is stored in an appropriate format, as it forms the basis for subsequent processing.

[0316] Step 2:

[0317] The server sends the acquired audio data to a speech recognition API (e.g., Google Speech-to-Text API) to convert it into text. This process outputs the audio data as text information, which is then used to record the meeting content.

[0318] Step 3:

[0319] The server uses an image processing library (e.g., OpenCV) on the video data to generate necessary screen captures, if required. This extracts important visual information from the meeting, which is useful for creating summaries later.

[0320] Step 4:

[0321] Based on text data and video information, the server uses a generative AI (e.g., OpenAI GPT-3) to generate summary information. Text and visual information are provided as input, and the AI ​​analyzes and processes them to create a summary and suggestions for future tasks.

[0322] Step 5:

[0323] The terminal displays summary information received from the server to the operator. Through the terminal's interface, the operator can review and edit the meeting summary and proposals for the next meeting.

[0324] Step 6:

[0325] Users can make final confirmations of the information presented on their device and make corrections as needed. This editable interface allows users to ensure the accuracy of the information and improve overall work efficiency.

[0326] Step 7:

[0327] The user sends the modified information from the terminal to the server, which then connects to an external scheduling service. By integrating the final meeting information with the scheduling service, the schedule for the next meeting is automatically updated.

[0328] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0329] This invention is a system for further enhancing the post-processing of online meetings and automating the provision of information that takes participants' emotions into account. This system comprises server, terminal, and user elements, and by combining them with an emotion recognition engine, it identifies the emotions of participants during a meeting and reflects the results in various functions.

[0330] The server collects audio, video, and related data from the online meeting platform. The acquired audio data is transcribed by a speech recognition service, and simultaneously, an emotion recognition engine analyzes participants' voice tones and facial expressions to determine their emotions. This generates not only textual information but also data indicating the emotional state of the participants.

[0331] The terminal presents the user with meeting minutes generated on the server, along with sentiment analysis results. Users can use an interface that allows them to visually see how emotional changes influenced the meeting. For example, they can easily see, using graphs and tags, what emotions participants displayed at important points in the discussion.

[0332] Users can view meeting minutes and sentiment analysis results on their devices and provide feedback or make corrections as needed. After user approval, the system adjusts the agenda for the next meeting to reflect the changes in sentiment. This agenda adjustment is expected to make the next meeting more efficient and effective.

[0333] For example, the server can capture participants' laughter and high-pitched voices, detecting positive emotions at specific points in the meeting. This allows users to further refine the agenda for the next meeting based on the positive feedback. Conversely, if negative reactions are detected, the server can analyze their causes and use them as clues to improve future discussions.

[0334] In this way, the present invention not only automates administrative tasks but also functions as a tool to improve the quality of meeting content. By understanding participants' emotions in real time and reflecting them in subsequent actions, it becomes possible to create new value in online communication.

[0335] The following describes the processing flow.

[0336] Step 1:

[0337] The server retrieves audio and video data from the online meeting platform. The retrieved data is stored internally and prepared to be sent to the speech recognition engine (audio data) and the emotion recognition engine (video data).

[0338] Step 2:

[0339] The server processes the audio data through a speech recognition engine to transcribe it and generate text data. Simultaneously, an emotion recognition engine analyzes the emotions from the audio, generating emotional data of the participants' speech. For example, it analyzes changes in voice tone and pitch to identify emotions such as joy or anger.

[0340] Step 3:

[0341] The server uses video data to allow an emotion recognition engine to analyze the participants' facial expressions. The output here is emotion data extracted from the visual image, where the type of emotion (e.g., tension, relaxation, excitement) is determined from subtle facial movements and expressions.

[0342] Step 4:

[0343] The server aggregates transcript data, sentiment data extracted from audio and video, and generates comprehensive meeting minutes. These minutes include sentiment information detected within each statement and discussion, and are linked to specific topics.

[0344] Step 5:

[0345] The terminal displays meeting minutes received from the server to the user. Through the interface, the user can visualize and see how emotions changed during the meeting. For example, they can view a graph showing the shift in emotions during important discussions.

[0346] Step 6:

[0347] Based on the presented meeting minutes and sentiment data, users review the minutes and add corrections or comments as needed. These corrections are then sent back to the server.

[0348] Step 7:

[0349] Based on the finalized and approved meeting minutes, the server adjusts the agenda for the next meeting based on sentiment analysis. For example, it allocates more time to topics that received positive feedback in the previous meeting, and sets follow-up actions for topics that received negative feedback.

[0350] Step 8:

[0351] The server automatically distributes the completed meeting minutes and the adjusted agenda for the next meeting to participants via email or messaging services. This distribution ensures that relevant information is shared quickly and effectively among stakeholders.

[0352] (Example 2)

[0353] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0354] Traditional online meeting systems lacked automated processes for creating meeting minutes and sentiment analysis, making it difficult to accurately reflect participants' emotional shifts and the depth of discussions. Furthermore, the inability to efficiently plan and adjust future meetings based on feedback could potentially lead to a decline in meeting quality.

[0355] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0356] In this invention, the server includes means for acquiring meeting information via a communication circuit, means for converting the acquired meeting audio information into text information, and means for generating meeting minutes data from the converted text information and meeting video information. This makes it possible to grasp the emotional changes of participants during a meeting in real time and to automatically adjust the plan for the next meeting based on the results of the emotional analysis.

[0357] A "communication circuit" is a means of transmitting and receiving data to and from a remote location, and is a technology used to exchange information over a network.

[0358] "Meeting information" refers to information including audio, video, and related data collected during an online meeting.

[0359] "Means of converting to textual information" refers to technologies that process audio information to visually represent it as text.

[0360] "Meeting minutes" are records compiled from text and video information of online meetings, and are used to show the progress and content of the meeting.

[0361] "Emotional analysis results" refer to data indicating the emotional state detected from participants' voices and videos, representing emotional changes during the meeting as numerical values ​​and graphs.

[0362] A "means for automatically adjusting the plan for the next meeting" is a system that optimizes the agenda and proceedings of the next meeting based on past meeting information and feedback.

[0363] "Visual representation" is a technique that uses visual elements such as graphs and icons to express emotional changes and information in a way that allows users to intuitively understand them.

[0364] An "external schedule management service" is an external system or application that manages calendars and schedules, and is used to efficiently manage meeting schedules.

[0365] This invention is a system that improves the efficiency of online meetings and automates the delivery of information while taking participants' emotions into consideration. This system consists of server, terminal, and user elements.

[0366] The server acquires audio, video, and related data from the online meeting platform in real time. Audio from the meeting information is converted into text data using a speech recognition service (e.g., an online speech recognition API). Simultaneously, an emotion recognition engine (e.g., an emotion analysis API) is used to analyze participants' emotions in real time based on their voice tone and facial expressions.

[0367] The terminal integrates sentiment analysis results with meeting minutes data processed on the server and presents them visually to the user. This allows users to easily see changes in emotions during the meeting through tagged graphs and icons. For example, positive reactions from participants during heated discussions are visually displayed.

[0368] The user reviews the meeting minutes and sentiment analysis results presented on their device. Based on these results, the user can provide feedback. After user approval, the system readjusts the agenda for the next meeting. This adjustment process utilizes a generative AI model, which presents the user with suggestions using optimal prompts.

[0369] For example, the server can capture laughter and tone of voice during a meeting, detect positive emotions at specific points in the discussion, and further enhance the positive summary in the next meeting. Conversely, in instances of negative reactions, the server can analyze the causes and use that information to improve the next meeting.

[0370] Example of a prompt:

[0371] "How can I understand participants' emotions in real time during an online meeting and adjust the agenda for the next meeting based on those emotional changes?"

[0372] This system functions as a tool to improve the quality of meetings and provides new value to online communication by effectively incorporating real-time emotional data into future actions.

[0373] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0374] Step 1:

[0375] The server retrieves meeting information from the online meeting platform via a communication circuit. This information includes audio, video, and associated metadata. The server receives this data as a stream and prepares it for the next processing step.

[0376] Step 2:

[0377] The server converts the acquired meeting audio data into text information. Specifically, it uses a speech recognition API to convert the audio stream into transcribed data. The input is audio data, and the output is text data. This process saves the meeting content as text, making it easy to search and analyze.

[0378] Step 3:

[0379] The server uses an emotion recognition engine, taking the acquired video data and converted text data as input. It analyzes voice tone and participants' facial expressions to identify their emotional state. The output of this analysis is metadata indicating the participants' emotional changes. Specifically, emotional ups and downs are tagged.

[0380] Step 4:

[0381] The server integrates text data and sentiment analysis results to generate meeting minutes. This generated data combines the main points of the discussion with changes in sentiment. It takes text data and sentiment metadata as input and produces complete meeting minutes as output.

[0382] Step 5:

[0383] The terminal receives meeting minutes data sent from the server and presents it to the user. It displays changes in emotions and the content of the discussion in a visual format. For example, it may include graphs or icons showing emotional peaks during specific discussions. The input is integrated meeting minutes data, and the output is visual information provided to the user.

[0384] Step 6:

[0385] Users can review meeting minutes and provide feedback through an interface on their devices. By commenting on the content of the minutes and the results of sentiment analysis, users can incorporate their feedback into the agenda for the next meeting. The input is user feedback, and the output is revised minutes and adjusted meeting plans.

[0386] Step 7:

[0387] The server takes user feedback into consideration and automatically adjusts the plan for the next meeting using a generative AI model. This process takes user feedback and past meeting data as input and generates the agenda for the next meeting as output. A generated prompt may also be presented as a suggestion.

[0388] (Application Example 2)

[0389] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0390] In online meetings, there is a need to analyze meeting content while considering participants' emotions and to improve the accuracy of future meeting planning. However, existing systems make it difficult to provide feedback and adjust agendas that appropriately reflect participants' emotional states. In particular, improving the quality of online learning and business meetings requires understanding changes in participants' emotions in real time and reflecting them in the meeting proceedings, but there is a lack of effective means to do so.

[0391] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0392] In this invention, the server includes means for acquiring meeting data via a communication network, means for converting the acquired meeting audio data into text, and means for analyzing participants' facial expressions and voice tones to generate emotion data. This makes it possible to generate and present meeting minutes that reflect the emotions of the participants.

[0393] A "communication network" is a network infrastructure used to share data among participants in a meeting or conference.

[0394] "Meeting data" refers to digital data including audio, video, and related information generated during online meetings and discussions.

[0395] "Text conversion" is the process of mechanically converting audio data into text data.

[0396] "Meeting information" refers to a document summarizing the progress and content of a meeting, and includes both text and image data.

[0397] "User" refers to anyone who uses this system to receive meeting information and sentiment data.

[0398] "Emotional data" refers to data that indicates the emotional state of participants, inferred from changes in their facial expressions and voice tone.

[0399] A "recording screen" refers to an image or screenshot taken during a meeting or gathering, and is used as part of the meeting information.

[0400] An "external calendar service" is a third-party calendar service used to manage meeting dates and appointments.

[0401] This invention is a system for generating and presenting information that takes participants' emotions into account during online meetings. The server acquires meeting data via a communication network and converts audio data into text. Furthermore, it uses an emotion recognition engine to carefully analyze participants' facial expressions and voice tone to generate emotion data. This engine integrates existing facial recognition and voice analysis technologies, enabling high-precision measurement of emotional states in real time. The emotion data clearly indicates states such as positive, negative, and neutral.

[0402] The generated sentiment data and meeting information are presented to the user via a terminal. Users can visually confirm changes in participants' sentiments through graphs and charts, and make adjustments as needed. This information is used when creating the plan for the next meeting, promoting more effective meeting management.

[0403] As a concrete example, the server performs real-time sentiment analysis of students and visually presents the results to the instructor, allowing for the measurement of students' interest and understanding during class. If the instructor determines that a student has questions, they can flexibly adjust the pace of the lesson and explain the content again.

[0404] An example of a prompt for a generative AI model is: "Please provide a model that analyzes the student's emotions during this lesson using three different parameters (interest, question, anxiety) and provides feedback based on that analysis." Such prompts can improve the accuracy of emotion recognition technology.

[0405] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0406] Step 1:

[0407] The server acquires meeting data via the communication network. The input data consists of meeting audio, video, and related metadata. This data is collected on the server, laying the foundation for subsequent analysis.

[0408] Step 2:

[0409] The server inputs the acquired audio data into a text conversion engine and converts it into text data. This process analyzes the audio and generates corresponding strings, which can then be used as meeting minutes. The output is a complete text-based transcript of the meeting.

[0410] Step 3:

[0411] The server uses an emotion recognition engine to analyze participants' facial expressions and voice tone, generating emotion data. Video data and voice tone are used as input. The system analyzes this data and classifies each participant's emotion as positive, negative, or neutral. The output is a dataset showing each participant's emotional state.

[0412] Step 4:

[0413] The terminal presents meeting information, combining generated text minutes and sentiment data, to the user via a graphical interface. This information visually represents changes in emotions over time, allowing users to easily understand the meeting content and their emotional responses. Input consists of generated text data and sentiment data, while output is a display integrating these.

[0414] Step 5:

[0415] Users can review the presented meeting information and add corrections or comments as needed. By providing feedback, users can improve the quality of meetings and incorporate it into future proceedings. The input is user feedback, and the output is the adjusted meeting information.

[0416] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0417] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0418] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0419] [Third Embodiment]

[0420] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0421] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0422] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0423] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0424] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0425] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0426] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0427] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0428] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0429] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0430] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0431] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0432] The system of this invention is designed to automate post-meeting processing for online meetings and to efficiently provide information to meeting participants. This system mainly consists of three elements: a server, a terminal, and a user, each of which plays a specific role.

[0433] The server acquires audio and video data of meetings through an interface with the online meeting platform. This data is transcribed using a speech recognition service, and screenshots of the meeting are extracted using image processing. The server inputs this information into a generation AI to create meeting minutes summarizing the key points of the meeting. The system also automatically generates draft materials and agendas for the next meeting.

[0434] The terminal displays meeting minutes and the next agenda generated on the server to the user. The user can use an interface on the terminal to view and edit this information. Once editing is complete, the changes are sent back to the server.

[0435] Users can review the meeting minutes generated via their devices and make corrections as needed. After the user has finalized the information, the system integrates with an external calendar service to automatically schedule the next meeting. This updates the meeting participants' calendars and provides necessary notifications.

[0436] As a concrete example, the server uses a specific API to send audio data to a transcription service, which then generates meeting minutes from the resulting text. These minutes are displayed in the user's browser, allowing them to review and make any necessary corrections. The approved meeting minutes are then automatically distributed to all participants via the server.

[0437] As described above, the system of the present invention efficiently handles post-processing of online meetings, reduces the burden on users, and enables the rapid and accurate provision of information.

[0438] The following describes the processing flow.

[0439] Step 1:

[0440] The server automatically collects audio and video data from online meeting platforms via their APIs. The collected data is stored internally and subjected to necessary preprocessing.

[0441] Step 2:

[0442] The server sends the audio data to a speech recognition service (e.g., a speech recognition API), which transcribes the audio and generates text data. This text data is then stored as meeting transcript information.

[0443] Step 3:

[0444] The server extracts screenshots from the meeting video data at specific time intervals and saves them as image data. These images are used as visual elements of the meeting content.

[0445] Step 4:

[0446] The server inputs the transcript data and image data into a generation AI to generate meeting minutes that summarize the key points of the meeting. These minutes may include extracted screenshots.

[0447] Step 5:

[0448] The terminal displays the generated meeting minutes and the agenda for the next meeting to the user. The user can use the interface on the terminal to review the content and edit or modify it as needed.

[0449] Step 6:

[0450] The user reviews the content and gives final approval via their device. The approved information is returned to the server and ready for distribution.

[0451] Step 7:

[0452] The server integrates with an external calendar service to automatically schedule the next meeting. The scheduled date is then reflected in the participants' calendars.

[0453] Step 8:

[0454] The server automatically distributes the completed meeting minutes and the agenda for the next meeting to participants via email or messaging service. This distribution allows all stakeholders to quickly share the meeting content and the date of the next meeting.

[0455] (Example 1)

[0456] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0457] Currently, creating meeting minutes after online meetings is time-consuming and laborious, hindering meeting efficiency. Furthermore, scheduling future meetings and sharing information with participants are often done manually, placing a significant burden on users. Solving these problems and achieving a smoother online meeting process is essential.

[0458] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0459] In this invention, the server includes means for acquiring meeting information via a communication network, means for converting the acquired meeting audio data into text data using speech recognition technology, and means for extracting screenshots from meeting video data using image analysis technology. This makes it possible to streamline the post-processing of online meetings and reduce the burden on users.

[0460] A "communication network" is a system that connects information devices such as computers located in different places to send and receive data.

[0461] "Meeting information" refers to all audio, video, and related data used in online meetings.

[0462] "Speech recognition technology" is a technology for converting speech data into text data.

[0463] "Character data" refers to text-based information that represents audio information converted using speech recognition technology.

[0464] "Image analysis technology" is the technology that analyzes image data to give it meaning.

[0465] A "screenshot" refers to an image saved from the screen used during a meeting.

[0466] "Generative AI" is a technology that uses artificial intelligence to generate new information from data, and in this context, it is used to generate meeting minutes.

[0467] "Summary information" refers to information that simplifies the original information and summarizes only the main points.

[0468] A "terminal" refers to a device that a user uses to view or edit information.

[0469] An "external schedule management service" refers to an external service used for managing schedules, such as a calendar or schedule management tool.

[0470] A "notification function" is a mechanism for conveying specific information to a user.

[0471] This system primarily consists of three elements: servers, terminals, and users. Each element plays a specific role, streamlining the post-processing of online meetings.

[0472] The server retrieves meeting information from online meeting platforms via a communication network. During this process, it uses APIs such as Zoom and Microsoft Teams to acquire audio and video data. The server converts the acquired audio data into text data using speech recognition technologies such as Google Cloud Speech-to-Text and IBM Watson. Furthermore, it extracts screenshots from the video data using image analysis technologies such as OpenCV and ffmpeg. This makes it possible to capture important parts of slides presented during the meeting.

[0473] The server utilizes a generative AI model to generate summary information based on acquired text data and screenshots. A large-scale language model such as GPT-3 is suitable for this purpose. An example of a prompt would be, "Based on the following text and image information, summarize the three main points of the meeting."

[0474] The terminal presents the user with summary information sent from the server. Specifically, it allows the user to view the summary information through a browser-based interface using HTML / CSS. The user edits the summary information using the provided interface and sends the results back to the server.

[0475] The summary information, after being reviewed and edited by the user, is used by the server to schedule the next meeting. The server integrates with external scheduling services (e.g., Google Calendar, Outlook) via API to automatically set the next meeting date. This allows meeting participants to receive schedule-based notifications, enabling efficient time management.

[0476] By implementing this system, users can significantly reduce the time and effort spent on post-meeting processing for online meetings, enabling efficient and accurate information sharing.

[0477] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0478] Step 1:

[0479] The server retrieves audio and video data from the online meeting platform via the communication network. It uses the specified API as input to request data about completed meetings. As output, the meeting's audio and video files are stored on the server.

[0480] Step 2:

[0481] The server converts the acquired audio data into text data using speech recognition technology such as Google Cloud Speech-to-Text. It sends the audio file as input to the speech recognition engine for processing. The output is the meeting content converted into text format.

[0482] Step 3:

[0483] The server extracts screenshots from video data to capture important moments. As input, it processes video files using image analysis techniques such as OpenCV. As output, screenshot images highlighting important slides and visuals are generated.

[0484] Step 4:

[0485] The server uses a generative AI model to generate summary information from text data and screenshots as input. It generates prompts, such as "Based on the following text and image information, summarize the three main points of the meeting." The output is a summarized meeting minutes document.

[0486] Step 5:

[0487] The terminal displays information on the user interface using summary information received from the server. It receives summary information from the server as input and converts it into a format viewable on a browser screen as output.

[0488] Step 6:

[0489] The user reviews the summary information presented on the terminal and edits it if necessary. They use the text editing function on the terminal as input and save the revised meeting minutes as output.

[0490] Step 7:

[0491] The server automatically schedules the next meeting in conjunction with an external scheduling service, based on the meeting minutes information reviewed and corrected by the user. The corrected meeting minutes and the requirements for the next meeting are used as input, and the newly scheduled meeting date is added to the external calendar as output.

[0492] (Application Example 1)

[0493] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0494] Modern organizational activities, particularly operational meetings in logistics centers, are frequent, and their efficiency and accurate information sharing are crucial. However, these meetings often require significant time and effort for recording content and organizing information, resulting in ineffective planning and clarification of procedures for future work. Therefore, there is a need for a system that automates efficient post-meeting processing and the rapid and accurate transmission of necessary information, thereby improving the accuracy and efficiency of operations.

[0495] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0496] In this invention, the server includes means for acquiring meeting data via a communication network, means for converting the acquired meeting audio data into text information, means for generating summary information from the text information and meeting video data, and means for displaying the summary information and the next work procedure using an operating device to provide work support. This enables efficient post-meeting processing and clarifies the next work procedure, thereby improving the efficiency and accuracy of operations at the logistics center.

[0497] A "communication network" is an information exchange system for sending and receiving digital data remotely.

[0498] "Meeting data" refers to a collection of information, such as audio and video, generated in connection with meetings and discussions.

[0499] "Means of converting into text information" refers to a function that analyzes audio data and converts the heard content into text-based text.

[0500] "Summary information" refers to concise information that summarizes the main topics and points covered at a meeting.

[0501] An "operator" refers to a person who uses a system to verify or manipulate information.

[0502] An "operating device" is a device used by users to display information and operate a system.

[0503] "Visual information" refers to digital data that can be visually recognized, such as screenshots and images.

[0504] An "external scheduling service" is an external platform for managing appointments and schedules electronically.

[0505] This invention provides a system that effectively utilizes meeting data collected via a communication network and performs efficient post-processing.

[0506] The server uses a communication network to acquire meeting data generated during conferences and discussions. The acquired audio data is converted into text information using a speech recognition API (e.g., Google Speech-to-Text API). Furthermore, video data is used to generate summary information using an image processing library (e.g., OpenCV). This summary information is meticulously constructed using a generative AI (e.g., OpenAI GPT-3) to concisely summarize the key points.

[0507] The terminal displays summary information and next-task procedures received from the server by the operator. Through the interface on the terminal, the operator can review and edit the information, which helps in planning future tasks.

[0508] Users can use their devices to review the presented summary information and, if necessary, utilize external scheduling services to coordinate the agenda and date of the next meeting.

[0509] As a concrete example, in daily shipping meetings at a logistics center, a server efficiently processes audio and video to generate summary information in a short time. This helps operators quickly access past meeting information and clarify the next steps in their work.

[0510] An example of an input prompt for the generation AI model is, "Please generate minutes for this online meeting. The full text of the meeting is below." By then entering the specific text of the meeting, the model will automatically generate the meeting's key points and the agenda for the next meeting.

[0511] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0512] Step 1:

[0513] The server acquires meeting data (audio and video) via the communication network. The acquired data is stored in an appropriate format, as it forms the basis for subsequent processing.

[0514] Step 2:

[0515] The server sends the acquired audio data to a speech recognition API (e.g., Google Speech-to-Text API) to convert it into text. This process outputs the audio data as text information, which is then used to record the meeting content.

[0516] Step 3:

[0517] The server uses an image processing library (e.g., OpenCV) on the video data to generate necessary screen captures, if required. This extracts important visual information from the meeting, which is useful for creating summaries later.

[0518] Step 4:

[0519] Based on text data and video information, the server uses a generative AI (e.g., OpenAI GPT-3) to generate summary information. Text and visual information are provided as input, and the AI ​​analyzes and processes them to create a summary and suggestions for future tasks.

[0520] Step 5:

[0521] The terminal displays summary information received from the server to the operator. Through the terminal's interface, the operator can review and edit the meeting summary and proposals for the next meeting.

[0522] Step 6:

[0523] Users can make final confirmations of the information presented on their device and make corrections as needed. This editable interface allows users to ensure the accuracy of the information and improve overall work efficiency.

[0524] Step 7:

[0525] The user sends the modified information from the terminal to the server, which then connects to an external scheduling service. By integrating the final meeting information with the scheduling service, the schedule for the next meeting is automatically updated.

[0526] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0527] This invention is a system for further enhancing the post-processing of online meetings and automating the provision of information that takes participants' emotions into account. This system comprises server, terminal, and user elements, and by combining them with an emotion recognition engine, it identifies the emotions of participants during a meeting and reflects the results in various functions.

[0528] The server collects audio, video, and related data from the online meeting platform. The acquired audio data is transcribed by a speech recognition service, and simultaneously, an emotion recognition engine analyzes participants' voice tones and facial expressions to determine their emotions. This generates not only textual information but also data indicating the emotional state of the participants.

[0529] The terminal presents the user with meeting minutes generated on the server, along with sentiment analysis results. Users can use an interface that allows them to visually see how emotional changes influenced the meeting. For example, they can easily see, using graphs and tags, what emotions participants displayed at important points in the discussion.

[0530] Users can view meeting minutes and sentiment analysis results on their devices and provide feedback or make corrections as needed. After user approval, the system adjusts the agenda for the next meeting to reflect the changes in sentiment. This agenda adjustment is expected to make the next meeting more efficient and effective.

[0531] For example, the server can capture participants' laughter and high-pitched voices, detecting positive emotions at specific points in the meeting. This allows users to further refine the agenda for the next meeting based on the positive feedback. Conversely, if negative reactions are detected, the server can analyze their causes and use them as clues to improve future discussions.

[0532] In this way, the present invention not only automates administrative tasks but also functions as a tool to improve the quality of meeting content. By understanding participants' emotions in real time and reflecting them in subsequent actions, it becomes possible to create new value in online communication.

[0533] The following describes the processing flow.

[0534] Step 1:

[0535] The server retrieves audio and video data from the online meeting platform. The retrieved data is stored internally and prepared to be sent to the speech recognition engine (audio data) and the emotion recognition engine (video data).

[0536] Step 2:

[0537] The server processes the audio data through a speech recognition engine to transcribe it and generate text data. Simultaneously, an emotion recognition engine analyzes the emotions from the audio, generating emotional data of the participants' speech. For example, it analyzes changes in voice tone and pitch to identify emotions such as joy or anger.

[0538] Step 3:

[0539] The server uses video data to allow an emotion recognition engine to analyze the participants' facial expressions. The output here is emotion data extracted from the visual image, where the type of emotion (e.g., tension, relaxation, excitement) is determined from subtle facial movements and expressions.

[0540] Step 4:

[0541] The server aggregates transcript data, sentiment data extracted from audio and video, and generates comprehensive meeting minutes. These minutes include sentiment information detected within each statement and discussion, and are linked to specific topics.

[0542] Step 5:

[0543] The terminal displays meeting minutes received from the server to the user. Through the interface, the user can visualize and see how emotions changed during the meeting. For example, they can view a graph showing the shift in emotions during important discussions.

[0544] Step 6:

[0545] Based on the presented meeting minutes and sentiment data, users review the minutes and add corrections or comments as needed. These corrections are then sent back to the server.

[0546] Step 7:

[0547] Based on the finalized and approved meeting minutes, the server adjusts the agenda for the next meeting based on sentiment analysis. For example, it allocates more time to topics that received positive feedback in the previous meeting, and sets follow-up actions for topics that received negative feedback.

[0548] Step 8:

[0549] The server automatically distributes the completed meeting minutes and the adjusted agenda for the next meeting to participants via email or messaging services. This distribution ensures that relevant information is shared quickly and effectively among stakeholders.

[0550] (Example 2)

[0551] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0552] Traditional online meeting systems lacked automated processes for creating meeting minutes and sentiment analysis, making it difficult to accurately reflect participants' emotional shifts and the depth of discussions. Furthermore, the inability to efficiently plan and adjust future meetings based on feedback could potentially lead to a decline in meeting quality.

[0553] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0554] In this invention, the server includes means for acquiring meeting information via a communication circuit, means for converting the acquired meeting audio information into text information, and means for generating meeting minutes data from the converted text information and meeting video information. This makes it possible to grasp the emotional changes of participants during a meeting in real time and to automatically adjust the plan for the next meeting based on the results of the emotional analysis.

[0555] A "communication circuit" is a means of transmitting and receiving data to and from a remote location, and is a technology used to exchange information over a network.

[0556] "Meeting information" refers to information including audio, video, and related data collected during an online meeting.

[0557] "Means of converting to textual information" refers to technologies that process audio information to visually represent it as text.

[0558] "Meeting minutes" are records compiled from text and video information of online meetings, and are used to show the progress and content of the meeting.

[0559] "Emotional analysis results" refer to data indicating the emotional state detected from participants' voices and videos, representing emotional changes during the meeting as numerical values ​​and graphs.

[0560] A "means for automatically adjusting the plan for the next meeting" is a system that optimizes the agenda and proceedings of the next meeting based on past meeting information and feedback.

[0561] "Visual representation" is a technique that uses visual elements such as graphs and icons to express emotional changes and information in a way that allows users to intuitively understand them.

[0562] An "external schedule management service" is an external system or application that manages calendars and schedules, and is used to efficiently manage meeting schedules.

[0563] This invention is a system that improves the efficiency of online meetings and automates the delivery of information while taking participants' emotions into consideration. This system consists of server, terminal, and user elements.

[0564] The server acquires audio, video, and related data from the online meeting platform in real time. Audio from the meeting information is converted into text data using a speech recognition service (e.g., an online speech recognition API). Simultaneously, an emotion recognition engine (e.g., an emotion analysis API) is used to analyze participants' emotions in real time based on their voice tone and facial expressions.

[0565] The terminal integrates sentiment analysis results with meeting minutes data processed on the server and presents them visually to the user. This allows users to easily see changes in emotions during the meeting through tagged graphs and icons. For example, positive reactions from participants during heated discussions are visually displayed.

[0566] The user reviews the meeting minutes and sentiment analysis results presented on their device. Based on these results, the user can provide feedback. After user approval, the system readjusts the agenda for the next meeting. This adjustment process utilizes a generative AI model, which presents the user with suggestions using optimal prompts.

[0567] For example, the server can capture laughter and tone of voice during a meeting, detect positive emotions at specific points in the discussion, and further enhance the positive summary in the next meeting. Conversely, in instances of negative reactions, the server can analyze the causes and use that information to improve the next meeting.

[0568] Example of a prompt:

[0569] "How can I understand participants' emotions in real time during an online meeting and adjust the agenda for the next meeting based on those emotional changes?"

[0570] This system functions as a tool to improve the quality of meetings and provides new value to online communication by effectively incorporating real-time emotional data into future actions.

[0571] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0572] Step 1:

[0573] The server retrieves meeting information from the online meeting platform via a communication circuit. This information includes audio, video, and associated metadata. The server receives this data as a stream and prepares it for the next processing step.

[0574] Step 2:

[0575] The server converts the acquired meeting audio data into text information. Specifically, it uses a speech recognition API to convert the audio stream into transcribed data. The input is audio data, and the output is text data. This process saves the meeting content as text, making it easy to search and analyze.

[0576] Step 3:

[0577] The server uses an emotion recognition engine, taking the acquired video data and converted text data as input. It analyzes voice tone and participants' facial expressions to identify their emotional state. The output of this analysis is metadata indicating the participants' emotional changes. Specifically, emotional ups and downs are tagged.

[0578] Step 4:

[0579] The server integrates text data and sentiment analysis results to generate meeting minutes. This generated data combines the main points of the discussion with changes in sentiment. It takes text data and sentiment metadata as input and produces complete meeting minutes as output.

[0580] Step 5:

[0581] The terminal receives meeting minutes data sent from the server and presents it to the user. It displays changes in emotions and the content of the discussion in a visual format. For example, it may include graphs or icons showing emotional peaks during specific discussions. The input is integrated meeting minutes data, and the output is visual information provided to the user.

[0582] Step 6:

[0583] Users can review meeting minutes and provide feedback through an interface on their devices. By commenting on the content of the minutes and the results of sentiment analysis, users can incorporate their feedback into the agenda for the next meeting. The input is user feedback, and the output is revised minutes and adjusted meeting plans.

[0584] Step 7:

[0585] The server takes user feedback into consideration and automatically adjusts the plan for the next meeting using a generative AI model. This process takes user feedback and past meeting data as input and generates the agenda for the next meeting as output. A generated prompt may also be presented as a suggestion.

[0586] (Application Example 2)

[0587] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0588] In online meetings, there is a need to analyze meeting content while considering participants' emotions and to improve the accuracy of future meeting planning. However, existing systems make it difficult to provide feedback and adjust agendas that appropriately reflect participants' emotional states. In particular, improving the quality of online learning and business meetings requires understanding changes in participants' emotions in real time and reflecting them in the meeting proceedings, but there is a lack of effective means to do so.

[0589] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0590] In this invention, the server includes means for acquiring meeting data via a communication network, means for converting the acquired meeting audio data into text, and means for analyzing participants' facial expressions and voice tones to generate emotion data. This makes it possible to generate and present meeting minutes that reflect the emotions of the participants.

[0591] A "communication network" is a network infrastructure used to share data among participants in a meeting or conference.

[0592] "Meeting data" refers to digital data including audio, video, and related information generated during online meetings and discussions.

[0593] "Text conversion" is the process of mechanically converting audio data into text data.

[0594] "Meeting information" refers to a document summarizing the progress and content of a meeting, and includes both text and image data.

[0595] "User" refers to anyone who uses this system to receive meeting information and sentiment data.

[0596] "Emotional data" refers to data that indicates the emotional state of participants, inferred from changes in their facial expressions and voice tone.

[0597] A "recording screen" refers to an image or screenshot taken during a meeting or gathering, and is used as part of the meeting information.

[0598] An "external calendar service" is a third-party calendar service used to manage meeting dates and appointments.

[0599] This invention is a system for generating and presenting information that takes participants' emotions into account during online meetings. The server acquires meeting data via a communication network and converts audio data into text. Furthermore, it uses an emotion recognition engine to carefully analyze participants' facial expressions and voice tone to generate emotion data. This engine integrates existing facial recognition and voice analysis technologies, enabling high-precision measurement of emotional states in real time. The emotion data clearly indicates states such as positive, negative, and neutral.

[0600] The generated sentiment data and meeting information are presented to the user via a terminal. Users can visually confirm changes in participants' sentiments through graphs and charts, and make adjustments as needed. This information is used when creating the plan for the next meeting, promoting more effective meeting management.

[0601] As a concrete example, the server performs real-time sentiment analysis of students and visually presents the results to the instructor, allowing for the measurement of students' interest and understanding during class. If the instructor determines that a student has questions, they can flexibly adjust the pace of the lesson and explain the content again.

[0602] An example of a prompt for a generative AI model is: "Please provide a model that analyzes the student's emotions during this lesson using three different parameters (interest, question, anxiety) and provides feedback based on that analysis." Such prompts can improve the accuracy of emotion recognition technology.

[0603] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0604] Step 1:

[0605] The server acquires meeting data via the communication network. The input data consists of meeting audio, video, and related metadata. This data is collected on the server, laying the foundation for subsequent analysis.

[0606] Step 2:

[0607] The server inputs the acquired audio data into a text conversion engine and converts it into text data. This process analyzes the audio and generates corresponding strings, which can then be used as meeting minutes. The output is a complete text-based transcript of the meeting.

[0608] Step 3:

[0609] The server uses an emotion recognition engine to analyze participants' facial expressions and voice tone, generating emotion data. Video data and voice tone are used as input. The system analyzes this data and classifies each participant's emotion as positive, negative, or neutral. The output is a dataset showing each participant's emotional state.

[0610] Step 4:

[0611] The terminal presents meeting information, combining generated text minutes and sentiment data, to the user via a graphical interface. This information visually represents changes in emotions over time, allowing users to easily understand the meeting content and their emotional responses. Input consists of generated text data and sentiment data, while output is a display integrating these.

[0612] Step 5:

[0613] Users can review the presented meeting information and add corrections or comments as needed. By providing feedback, users can improve the quality of meetings and incorporate it into future proceedings. The input is user feedback, and the output is the adjusted meeting information.

[0614] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0615] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0616] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0617] [Fourth Embodiment]

[0618] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0619] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0620] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0621] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0622] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0623] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0624] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0625] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0626] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0627] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0628] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0629] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0630] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0631] The system of this invention is designed to automate post-meeting processing for online meetings and to efficiently provide information to meeting participants. This system mainly consists of three elements: a server, a terminal, and a user, each of which plays a specific role.

[0632] The server acquires audio and video data of meetings through an interface with the online meeting platform. This data is transcribed using a speech recognition service, and screenshots of the meeting are extracted using image processing. The server inputs this information into a generation AI to create meeting minutes summarizing the key points of the meeting. The system also automatically generates draft materials and agendas for the next meeting.

[0633] The terminal displays meeting minutes and the next agenda generated on the server to the user. The user can use an interface on the terminal to view and edit this information. Once editing is complete, the changes are sent back to the server.

[0634] Users can review the meeting minutes generated via their devices and make corrections as needed. After the user has finalized the information, the system integrates with an external calendar service to automatically schedule the next meeting. This updates the meeting participants' calendars and provides necessary notifications.

[0635] As a concrete example, the server uses a specific API to send audio data to a transcription service, which then generates meeting minutes from the resulting text. These minutes are displayed in the user's browser, allowing them to review and make any necessary corrections. The approved meeting minutes are then automatically distributed to all participants via the server.

[0636] As described above, the system of the present invention efficiently handles post-processing of online meetings, reduces the burden on users, and enables the rapid and accurate provision of information.

[0637] The following describes the processing flow.

[0638] Step 1:

[0639] The server automatically collects audio and video data from online meeting platforms via their APIs. The collected data is stored internally and subjected to necessary preprocessing.

[0640] Step 2:

[0641] The server sends the audio data to a speech recognition service (e.g., a speech recognition API), which transcribes the audio and generates text data. This text data is then stored as meeting transcript information.

[0642] Step 3:

[0643] The server extracts screenshots from the meeting video data at specific time intervals and saves them as image data. These images are used as visual elements of the meeting content.

[0644] Step 4:

[0645] The server inputs the transcript data and image data into a generation AI to generate meeting minutes that summarize the key points of the meeting. These minutes may include extracted screenshots.

[0646] Step 5:

[0647] The terminal displays the generated meeting minutes and the agenda for the next meeting to the user. The user can use the interface on the terminal to review the content and edit or modify it as needed.

[0648] Step 6:

[0649] The user reviews the content and gives final approval via their device. The approved information is returned to the server and ready for distribution.

[0650] Step 7:

[0651] The server integrates with an external calendar service to automatically schedule the next meeting. The scheduled date is then reflected in the participants' calendars.

[0652] Step 8:

[0653] The server automatically distributes the completed meeting minutes and the agenda for the next meeting to participants via email or messaging service. This distribution allows all stakeholders to quickly share the meeting content and the date of the next meeting.

[0654] (Example 1)

[0655] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0656] Currently, creating meeting minutes after online meetings is time-consuming and laborious, hindering meeting efficiency. Furthermore, scheduling future meetings and sharing information with participants are often done manually, placing a significant burden on users. Solving these problems and achieving a smoother online meeting process is essential.

[0657] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0658] In this invention, the server includes means for acquiring meeting information via a communication network, means for converting the acquired meeting audio data into text data using speech recognition technology, and means for extracting screenshots from meeting video data using image analysis technology. This makes it possible to streamline the post-processing of online meetings and reduce the burden on users.

[0659] A "communication network" is a system that connects information devices such as computers located in different places to send and receive data.

[0660] "Meeting information" refers to all audio, video, and related data used in online meetings.

[0661] "Speech recognition technology" is a technology for converting speech data into text data.

[0662] "Character data" refers to text-based information that represents audio information converted using speech recognition technology.

[0663] "Image analysis technology" is the technology that analyzes image data to give it meaning.

[0664] A "screenshot" refers to an image saved from the screen used during a meeting.

[0665] "Generative AI" is a technology that uses artificial intelligence to generate new information from data, and in this context, it is used to generate meeting minutes.

[0666] "Summary information" refers to information that simplifies the original information and summarizes only the main points.

[0667] A "terminal" refers to a device that a user uses to view or edit information.

[0668] An "external schedule management service" refers to an external service used for managing schedules, such as a calendar or schedule management tool.

[0669] A "notification function" is a mechanism for conveying specific information to a user.

[0670] This system primarily consists of three elements: servers, terminals, and users. Each element plays a specific role, streamlining the post-processing of online meetings.

[0671] The server retrieves meeting information from online meeting platforms via a communication network. During this process, it uses APIs such as Zoom and Microsoft Teams to acquire audio and video data. The server converts the acquired audio data into text data using speech recognition technologies such as Google Cloud Speech-to-Text and IBM Watson. Furthermore, it extracts screenshots from the video data using image analysis technologies such as OpenCV and ffmpeg. This makes it possible to capture important parts of slides presented during the meeting.

[0672] The server utilizes a generative AI model to generate summary information based on acquired text data and screenshots. A large-scale language model such as GPT-3 is suitable for this purpose. An example of a prompt would be, "Based on the following text and image information, summarize the three main points of the meeting."

[0673] The terminal presents the user with summary information sent from the server. Specifically, it allows the user to view the summary information through a browser-based interface using HTML / CSS. The user edits the summary information using the provided interface and sends the results back to the server.

[0674] The summary information, after being reviewed and edited by the user, is used by the server to schedule the next meeting. The server integrates with external scheduling services (e.g., Google Calendar, Outlook) via API to automatically set the next meeting date. This allows meeting participants to receive schedule-based notifications, enabling efficient time management.

[0675] By implementing this system, users can significantly reduce the time and effort spent on post-meeting processing for online meetings, enabling efficient and accurate information sharing.

[0676] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0677] Step 1:

[0678] The server retrieves audio and video data from the online meeting platform via the communication network. It uses the specified API as input to request data about completed meetings. As output, the meeting's audio and video files are stored on the server.

[0679] Step 2:

[0680] The server converts the acquired audio data into text data using speech recognition technology such as Google Cloud Speech-to-Text. It sends the audio file as input to the speech recognition engine for processing. The output is the meeting content converted into text format.

[0681] Step 3:

[0682] The server extracts screenshots from video data to capture important moments. As input, it processes video files using image analysis techniques such as OpenCV. As output, screenshot images highlighting important slides and visuals are generated.

[0683] Step 4:

[0684] The server uses a generative AI model to generate summary information from text data and screenshots as input. It generates prompts, such as "Based on the following text and image information, summarize the three main points of the meeting." The output is a summarized meeting minutes document.

[0685] Step 5:

[0686] The terminal displays information on the user interface using summary information received from the server. It receives summary information from the server as input and converts it into a format viewable on a browser screen as output.

[0687] Step 6:

[0688] The user reviews the summary information presented on the terminal and edits it if necessary. They use the text editing function on the terminal as input and save the revised meeting minutes as output.

[0689] Step 7:

[0690] The server automatically schedules the next meeting in conjunction with an external scheduling service, based on the meeting minutes information reviewed and corrected by the user. The corrected meeting minutes and the requirements for the next meeting are used as input, and the newly scheduled meeting date is added to the external calendar as output.

[0691] (Application Example 1)

[0692] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0693] Modern organizational activities, particularly operational meetings in logistics centers, are frequent, and their efficiency and accurate information sharing are crucial. However, these meetings often require significant time and effort for recording content and organizing information, resulting in ineffective planning and clarification of procedures for future work. Therefore, there is a need for a system that automates efficient post-meeting processing and the rapid and accurate transmission of necessary information, thereby improving the accuracy and efficiency of operations.

[0694] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0695] In this invention, the server includes means for acquiring meeting data via a communication network, means for converting the acquired meeting audio data into text information, means for generating summary information from the text information and meeting video data, and means for displaying the summary information and the next work procedure using an operating device to provide work support. This enables efficient post-meeting processing and clarifies the next work procedure, thereby improving the efficiency and accuracy of operations at the logistics center.

[0696] A "communication network" is an information exchange system for sending and receiving digital data remotely.

[0697] "Meeting data" refers to a collection of information, such as audio and video, generated in connection with meetings and discussions.

[0698] "Means of converting into text information" refers to a function that analyzes audio data and converts the heard content into text-based text.

[0699] "Summary information" refers to concise information that summarizes the main topics and points covered at a meeting.

[0700] An "operator" refers to a person who uses a system to verify or manipulate information.

[0701] An "operating device" is a device used by users to display information and operate a system.

[0702] "Visual information" refers to digital data that can be visually recognized, such as screenshots and images.

[0703] An "external scheduling service" is an external platform for managing appointments and schedules electronically.

[0704] This invention provides a system that effectively utilizes meeting data collected via a communication network and performs efficient post-processing.

[0705] The server uses a communication network to acquire meeting data generated during conferences and discussions. The acquired audio data is converted into text information using a speech recognition API (e.g., Google Speech-to-Text API). Furthermore, video data is used to generate summary information using an image processing library (e.g., OpenCV). This summary information is meticulously constructed using a generative AI (e.g., OpenAI GPT-3) to concisely summarize the key points.

[0706] The terminal displays summary information and next-task procedures received from the server by the operator. Through the interface on the terminal, the operator can review and edit the information, which helps in planning future tasks.

[0707] Users can use their devices to review the presented summary information and, if necessary, utilize external scheduling services to coordinate the agenda and date of the next meeting.

[0708] As a concrete example, in daily shipping meetings at a logistics center, a server efficiently processes audio and video to generate summary information in a short time. This helps operators quickly access past meeting information and clarify the next steps in their work.

[0709] An example of an input prompt for the generation AI model is, "Please generate minutes for this online meeting. The full text of the meeting is below." By then entering the specific text of the meeting, the model will automatically generate the meeting's key points and the agenda for the next meeting.

[0710] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0711] Step 1:

[0712] The server acquires meeting data (audio and video) via the communication network. The acquired data is stored in an appropriate format, as it forms the basis for subsequent processing.

[0713] Step 2:

[0714] The server sends the acquired audio data to a speech recognition API (e.g., Google Speech-to-Text API) to convert it into text. This process outputs the audio data as text information, which is then used to record the meeting content.

[0715] Step 3:

[0716] The server uses an image processing library (e.g., OpenCV) on the video data to generate necessary screen captures, if required. This extracts important visual information from the meeting, which is useful for creating summaries later.

[0717] Step 4:

[0718] Based on text data and video information, the server uses a generative AI (e.g., OpenAI GPT-3) to generate summary information. Text and visual information are provided as input, and the AI ​​analyzes and processes them to create a summary and suggestions for future tasks.

[0719] Step 5:

[0720] The terminal displays summary information received from the server to the operator. Through the terminal's interface, the operator can review and edit the meeting summary and proposals for the next meeting.

[0721] Step 6:

[0722] Users can make final confirmations of the information presented on their device and make corrections as needed. This editable interface allows users to ensure the accuracy of the information and improve overall work efficiency.

[0723] Step 7:

[0724] The user sends the modified information from the terminal to the server, which then connects to an external scheduling service. By integrating the final meeting information with the scheduling service, the schedule for the next meeting is automatically updated.

[0725] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0726] This invention is a system for further enhancing the post-processing of online meetings and automating the provision of information that takes participants' emotions into account. This system comprises server, terminal, and user elements, and by combining them with an emotion recognition engine, it identifies the emotions of participants during a meeting and reflects the results in various functions.

[0727] The server collects audio, video, and related data from the online meeting platform. The acquired audio data is transcribed by a speech recognition service, and simultaneously, an emotion recognition engine analyzes participants' voice tones and facial expressions to determine their emotions. This generates not only textual information but also data indicating the emotional state of the participants.

[0728] The terminal presents the user with meeting minutes generated on the server, along with sentiment analysis results. Users can use an interface that allows them to visually see how emotional changes influenced the meeting. For example, they can easily see, using graphs and tags, what emotions participants displayed at important points in the discussion.

[0729] Users can view meeting minutes and sentiment analysis results on their devices and provide feedback or make corrections as needed. After user approval, the system adjusts the agenda for the next meeting to reflect the changes in sentiment. This agenda adjustment is expected to make the next meeting more efficient and effective.

[0730] For example, the server can capture participants' laughter and high-pitched voices, detecting positive emotions at specific points in the meeting. This allows users to further refine the agenda for the next meeting based on the positive feedback. Conversely, if negative reactions are detected, the server can analyze their causes and use them as clues to improve future discussions.

[0731] In this way, the present invention not only automates administrative tasks but also functions as a tool to improve the quality of meeting content. By understanding participants' emotions in real time and reflecting them in subsequent actions, it becomes possible to create new value in online communication.

[0732] The following describes the processing flow.

[0733] Step 1:

[0734] The server retrieves audio and video data from the online meeting platform. The retrieved data is stored internally and prepared to be sent to the speech recognition engine (audio data) and the emotion recognition engine (video data).

[0735] Step 2:

[0736] The server processes the audio data through a speech recognition engine to transcribe it and generate text data. Simultaneously, an emotion recognition engine analyzes the emotions from the audio, generating emotional data of the participants' speech. For example, it analyzes changes in voice tone and pitch to identify emotions such as joy or anger.

[0737] Step 3:

[0738] The server uses video data to allow an emotion recognition engine to analyze the participants' facial expressions. The output here is emotion data extracted from the visual image, where the type of emotion (e.g., tension, relaxation, excitement) is determined from subtle facial movements and expressions.

[0739] Step 4:

[0740] The server aggregates transcript data, sentiment data extracted from audio and video, and generates comprehensive meeting minutes. These minutes include sentiment information detected within each statement and discussion, and are linked to specific topics.

[0741] Step 5:

[0742] The terminal displays meeting minutes received from the server to the user. Through the interface, the user can visualize and see how emotions changed during the meeting. For example, they can view a graph showing the shift in emotions during important discussions.

[0743] Step 6:

[0744] Based on the presented meeting minutes and sentiment data, users review the minutes and add corrections or comments as needed. These corrections are then sent back to the server.

[0745] Step 7:

[0746] Based on the finalized and approved meeting minutes, the server adjusts the agenda for the next meeting based on sentiment analysis. For example, it allocates more time to topics that received positive feedback in the previous meeting, and sets follow-up actions for topics that received negative feedback.

[0747] Step 8:

[0748] The server automatically distributes the completed meeting minutes and the adjusted agenda for the next meeting to participants via email or messaging services. This distribution ensures that relevant information is shared quickly and effectively among stakeholders.

[0749] (Example 2)

[0750] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0751] Traditional online meeting systems lacked automated processes for creating meeting minutes and sentiment analysis, making it difficult to accurately reflect participants' emotional shifts and the depth of discussions. Furthermore, the inability to efficiently plan and adjust future meetings based on feedback could potentially lead to a decline in meeting quality.

[0752] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0753] In this invention, the server includes means for acquiring meeting information via a communication circuit, means for converting the acquired meeting audio information into text information, and means for generating meeting minutes data from the converted text information and meeting video information. This makes it possible to grasp the emotional changes of participants during a meeting in real time and to automatically adjust the plan for the next meeting based on the results of the emotional analysis.

[0754] A "communication circuit" is a means of transmitting and receiving data to and from a remote location, and is a technology used to exchange information over a network.

[0755] "Meeting information" refers to information including audio, video, and related data collected during an online meeting.

[0756] "Means of converting to textual information" refers to technologies that process audio information to visually represent it as text.

[0757] "Meeting minutes" are records compiled from text and video information of online meetings, and are used to show the progress and content of the meeting.

[0758] "Emotional analysis results" refer to data indicating the emotional state detected from participants' voices and videos, representing emotional changes during the meeting as numerical values ​​and graphs.

[0759] A "means for automatically adjusting the plan for the next meeting" is a system that optimizes the agenda and proceedings of the next meeting based on past meeting information and feedback.

[0760] "Visual representation" is a technique that uses visual elements such as graphs and icons to express emotional changes and information in a way that allows users to intuitively understand them.

[0761] An "external schedule management service" is an external system or application that manages calendars and schedules, and is used to efficiently manage meeting schedules.

[0762] This invention is a system that improves the efficiency of online meetings and automates the delivery of information while taking participants' emotions into consideration. This system consists of server, terminal, and user elements.

[0763] The server acquires audio, video, and related data from the online meeting platform in real time. Audio from the meeting information is converted into text data using a speech recognition service (e.g., an online speech recognition API). Simultaneously, an emotion recognition engine (e.g., an emotion analysis API) is used to analyze participants' emotions in real time based on their voice tone and facial expressions.

[0764] The terminal integrates sentiment analysis results with meeting minutes data processed on the server and presents them visually to the user. This allows users to easily see changes in emotions during the meeting through tagged graphs and icons. For example, positive reactions from participants during heated discussions are visually displayed.

[0765] The user reviews the meeting minutes and sentiment analysis results presented on their device. Based on these results, the user can provide feedback. After user approval, the system readjusts the agenda for the next meeting. This adjustment process utilizes a generative AI model, which presents the user with suggestions using optimal prompts.

[0766] For example, the server can capture laughter and tone of voice during a meeting, detect positive emotions at specific points in the discussion, and further enhance the positive summary in the next meeting. Conversely, in instances of negative reactions, the server can analyze the causes and use that information to improve the next meeting.

[0767] Example of a prompt:

[0768] "How can I understand participants' emotions in real time during an online meeting and adjust the agenda for the next meeting based on those emotional changes?"

[0769] This system functions as a tool to improve the quality of meetings and provides new value to online communication by effectively incorporating real-time emotional data into future actions.

[0770] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0771] Step 1:

[0772] The server retrieves meeting information from the online meeting platform via a communication circuit. This information includes audio, video, and associated metadata. The server receives this data as a stream and prepares it for the next processing step.

[0773] Step 2:

[0774] The server converts the acquired meeting audio data into text information. Specifically, it uses a speech recognition API to convert the audio stream into transcribed data. The input is audio data, and the output is text data. This process saves the meeting content as text, making it easy to search and analyze.

[0775] Step 3:

[0776] The server uses an emotion recognition engine, taking the acquired video data and converted text data as input. It analyzes voice tone and participants' facial expressions to identify their emotional state. The output of this analysis is metadata indicating the participants' emotional changes. Specifically, emotional ups and downs are tagged.

[0777] Step 4:

[0778] The server integrates text data and sentiment analysis results to generate meeting minutes. This generated data combines the main points of the discussion with changes in sentiment. It takes text data and sentiment metadata as input and produces complete meeting minutes as output.

[0779] Step 5:

[0780] The terminal receives meeting minutes data sent from the server and presents it to the user. It displays changes in emotions and the content of the discussion in a visual format. For example, it may include graphs or icons showing emotional peaks during specific discussions. The input is integrated meeting minutes data, and the output is visual information provided to the user.

[0781] Step 6:

[0782] Users can review meeting minutes and provide feedback through an interface on their devices. By commenting on the content of the minutes and the results of sentiment analysis, users can incorporate their feedback into the agenda for the next meeting. The input is user feedback, and the output is revised minutes and adjusted meeting plans.

[0783] Step 7:

[0784] The server takes user feedback into consideration and automatically adjusts the plan for the next meeting using a generative AI model. This process takes user feedback and past meeting data as input and generates the agenda for the next meeting as output. A generated prompt may also be presented as a suggestion.

[0785] (Application Example 2)

[0786] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0787] In online meetings, there is a need to analyze meeting content while considering participants' emotions and to improve the accuracy of future meeting planning. However, existing systems make it difficult to provide feedback and adjust agendas that appropriately reflect participants' emotional states. In particular, improving the quality of online learning and business meetings requires understanding changes in participants' emotions in real time and reflecting them in the meeting proceedings, but there is a lack of effective means to do so.

[0788] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0789] In this invention, the server includes means for acquiring meeting data via a communication network, means for converting the acquired meeting audio data into text, and means for analyzing participants' facial expressions and voice tones to generate emotion data. This makes it possible to generate and present meeting minutes that reflect the emotions of the participants.

[0790] A "communication network" is a network infrastructure used to share data among participants in a meeting or conference.

[0791] "Meeting data" refers to digital data including audio, video, and related information generated during online meetings and discussions.

[0792] "Text conversion" is the process of mechanically converting audio data into text data.

[0793] "Meeting information" refers to a document summarizing the progress and content of a meeting, and includes both text and image data.

[0794] "User" refers to anyone who uses this system to receive meeting information and sentiment data.

[0795] "Emotional data" refers to data that indicates the emotional state of participants, inferred from changes in their facial expressions and voice tone.

[0796] A "recording screen" refers to an image or screenshot taken during a meeting or gathering, and is used as part of the meeting information.

[0797] An "external calendar service" is a third-party calendar service used to manage meeting dates and appointments.

[0798] This invention is a system for generating and presenting information that takes participants' emotions into account during online meetings. The server acquires meeting data via a communication network and converts audio data into text. Furthermore, it uses an emotion recognition engine to carefully analyze participants' facial expressions and voice tone to generate emotion data. This engine integrates existing facial recognition and voice analysis technologies, enabling high-precision measurement of emotional states in real time. The emotion data clearly indicates states such as positive, negative, and neutral.

[0799] The generated sentiment data and meeting information are presented to the user via a terminal. Users can visually confirm changes in participants' sentiments through graphs and charts, and make adjustments as needed. This information is used when creating the plan for the next meeting, promoting more effective meeting management.

[0800] As a concrete example, the server performs real-time sentiment analysis of students and visually presents the results to the instructor, allowing for the measurement of students' interest and understanding during class. If the instructor determines that a student has questions, they can flexibly adjust the pace of the lesson and explain the content again.

[0801] An example of a prompt for a generative AI model is: "Please provide a model that analyzes the student's emotions during this lesson using three different parameters (interest, question, anxiety) and provides feedback based on that analysis." Such prompts can improve the accuracy of emotion recognition technology.

[0802] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0803] Step 1:

[0804] The server acquires meeting data via the communication network. The input data consists of meeting audio, video, and related metadata. This data is collected on the server, laying the foundation for subsequent analysis.

[0805] Step 2:

[0806] The server inputs the acquired audio data into a text conversion engine and converts it into text data. This process analyzes the audio and generates corresponding strings, which can then be used as meeting minutes. The output is a complete text-based transcript of the meeting.

[0807] Step 3:

[0808] The server uses an emotion recognition engine to analyze participants' facial expressions and voice tone, generating emotion data. Video data and voice tone are used as input. The system analyzes this data and classifies each participant's emotion as positive, negative, or neutral. The output is a dataset showing each participant's emotional state.

[0809] Step 4:

[0810] The terminal presents meeting information, combining generated text minutes and sentiment data, to the user via a graphical interface. This information visually represents changes in emotions over time, allowing users to easily understand the meeting content and their emotional responses. Input consists of generated text data and sentiment data, while output is a display integrating these.

[0811] Step 5:

[0812] Users can review the presented meeting information and add corrections or comments as needed. By providing feedback, users can improve the quality of meetings and incorporate it into future proceedings. The input is user feedback, and the output is the adjusted meeting information.

[0813] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0814] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0815] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0816] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0817] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0818] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0819] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0820] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0821] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0822] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0823] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0824] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0825] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0826] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0827] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0828] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0829] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0830] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0831] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0832] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0833] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0834] The following is further disclosed regarding the embodiments described above.

[0835] (Claim 1)

[0836] A means of acquiring conference data via a communication network,

[0837] Methods for transcribing acquired meeting audio data,

[0838] A means for generating meeting minutes from transcript data and meeting image data,

[0839] A means of presenting the generated meeting minutes to the user and accepting edits,

[0840] A way to automatically schedule the next meeting,

[0841] A method for automatically distributing meeting information to participants,

[0842] A system that includes this.

[0843] (Claim 2)

[0844] The system according to claim 1, which integrates and presents screenshots with generated meeting minutes.

[0845] (Claim 3)

[0846] The system according to claim 1, which updates the schedule for the next meeting in conjunction with an external calendar service after user confirmation.

[0847] "Example 1"

[0848] (Claim 1)

[0849] A means of obtaining conference information via a communication network,

[0850] A means of converting acquired meeting audio data into text data using speech recognition technology,

[0851] A method for extracting screenshots from meeting video data using image analysis technology,

[0852] A means of generating summary information using AI based on text data and extracted image data,

[0853] A means of displaying the generated summary information on a terminal and accepting edits,

[0854] A method to automatically schedule the next meeting by linking with an external scheduling service after reviewing the edited information,

[0855] A method for automatically sending meeting information to all participants,

[0856] A system that includes this.

[0857] (Claim 2)

[0858] The system according to claim 1, which integrates and presents images showing important moments during a meeting with the generated summary information.

[0859] (Claim 3)

[0860] The system according to claim 1, which includes a notification function when updating the date of the next meeting in an external schedule management service after user confirmation.

[0861] "Application Example 1"

[0862] (Claim 1)

[0863] A means of acquiring meeting data via a communication network,

[0864] A means of converting acquired meeting audio data into text information,

[0865] A means for generating summary information from text information and meeting video data,

[0866] A means of presenting the generated summary information to the operator and accepting modifications,

[0867] A means to automatically set the agenda for the next meeting,

[0868] A means of automatically sending meeting information to members,

[0869] A means of providing work support by displaying summary information and next work procedures using an operating device,

[0870] A system that includes this.

[0871] (Claim 2)

[0872] The system according to claim 1, which integrates and presents visual information with generated summary information.

[0873] (Claim 3)

[0874] The system according to claim 1, which updates the plan for the next meeting in cooperation with an external scheduling service after confirmation by the operator.

[0875] "Example 2 of combining an emotion engine"

[0876] (Claim 1)

[0877] A means of acquiring conference information via a communication circuit,

[0878] A means of converting acquired meeting audio information into text information,

[0879] A means for generating meeting minutes data from converted text information and meeting video information,

[0880] A means of integrating sentiment analysis results into meeting minutes data and presenting them to the user,

[0881] A means to automatically adjust the plan for the next meeting based on user feedback,

[0882] A means of sending meeting information to participants,

[0883] A system that includes this.

[0884] (Claim 2)

[0885] The system according to claim 1, which uses visual representations to show changes in emotion in the generated meeting minutes data.

[0886] (Claim 3)

[0887] The system according to claim 1, which updates the schedule for the next meeting in conjunction with an external schedule management service after user approval.

[0888] "Application example 2 when combining with an emotional engine"

[0889] (Claim 1)

[0890] A means of acquiring meeting data via a communication network,

[0891] A means of converting acquired meeting audio data into text,

[0892] A means for generating meeting information from text conversion data and meeting image data,

[0893] A means of presenting generated meeting information to users and accepting corrections,

[0894] A means to automatically set the schedule for the next meeting,

[0895] A method for automatically sending meeting information to participants,

[0896] A means for analyzing participants' facial expressions and voice tone to generate emotional data,

[0897] A means of visually presenting the generated emotional data to the user,

[0898] A system that includes this.

[0899] (Claim 2)

[0900] The system according to claim 1, which integrates and presents the generated meeting information with the recording screen.

[0901] (Claim 3)

[0902] The system according to claim 1, which updates the plan for the next meeting in conjunction with an external scheduling service after user confirmation. [Explanation of Symbols]

[0903] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of acquiring meeting data via a communication network, A means of converting acquired meeting audio data into text information, A means for generating summary information from text information and meeting video data, A means of presenting the generated summary information to the operator and accepting modifications, A means to automatically set the agenda for the next meeting, A means of automatically sending meeting information to members, A means of providing work support by displaying summary information and next work procedures using an operating device, A system that includes this.

2. The system according to claim 1, which integrates and presents visual information with the generated summary information.

3. The system according to claim 1, which updates the plan for the next meeting in cooperation with an external schedule management service after confirmation by the operator.