system
The automated system with virtual avatars addresses inefficiencies in modern meetings by streamlining agenda collection, real-time information provision, and automated minute generation, enhancing meeting efficiency and information capture.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
Modern meetings are inefficient, requiring significant time and effort for preparation, minute-taking, and often overlook important information due to stagnant discussions.
An automated system using virtual avatars that streamline meeting management by collecting agenda items, providing real-time information, and generating meeting minutes, thereby automating the entire process from preparation to follow-up.
Reduces time spent on meeting management and ensures effective meetings by organizing discussions, ensuring all information is captured and distributed efficiently.
Smart Images

Figure 2026101220000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In the modern business environment, meetings always play an important role. However, participants often have busy schedules and it is difficult to conduct meetings efficiently and smoothly. Also, a great deal of time and effort are spent on the progress of meetings and the creation of meeting minutes, and thus it is required to perform these tasks more efficiently. Furthermore, since discussions during meetings may stagnate or important information may be overlooked, it is also a problem to make the time valuable for participants. It is an object of the present invention to provide a method for improving these situations and operating meetings more effectively and efficiently.
Means for Solving the Problems
[0005] This invention provides an automated system using virtual avatars in meetings, including means for streamlining meeting management and minute-taking. First, the generated virtual avatars automatically collect agenda items by sending an online form to participants in advance. The collected agenda items are organized and listed according to their importance. During the meeting, the virtual avatars interact with participants and provide necessary information in real time. Furthermore, by analyzing recorded or text data during the meeting, meeting minutes are automatically generated and distributed to participants, thereby automating the entire process from meeting preparation to post-meeting follow-up. This reduces the time required for meeting management and enables more effective meetings.
[0006] "Generation means" refers to a means of generating virtual avatars and providing automated progress during meetings.
[0007] "Method of collection" refers to the means used to gather agenda items from meeting participants, and includes automatically obtaining information through online forms.
[0008] "Method of creation" refers to a method of generating an agenda list based on the collected topics, in order to facilitate the flow of the meeting.
[0009] "Means of provision" refers to the means by which a virtual avatar provides necessary information and related data in real time through interaction with participants during a meeting.
[0010] "Generation means" refers to a means of analyzing audio or text data from a meeting and generating meeting minutes that record the content of the meeting.
[0011] "Distribution method" refers to a means of distributing the generated meeting minutes to participants to facilitate information sharing. [Brief explanation of the drawing]
[0012] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2]This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]
[0013] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.
[0014] First, the terms used in the following description will be explained.
[0015] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0016] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0017] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0018] In the following embodiments, a numbered communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), etc.
[0019] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0020] [First Embodiment]
[0021] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0022] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0023] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0024] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0025] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0026] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0027] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0028] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0029] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0030] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0031] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0032] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0033] This invention is an automated meeting system using virtual avatars, and its operation is realized by multiple devices. The following describes how the server, terminal, and user cooperate to implement this system.
[0034] The server acts as the central hub of the system, automatically sending online forms to participants once a meeting is scheduled. Through these forms, it collects agenda items and questions from participants, organizing and storing them in a database. The collected information is analyzed by the server, and an agenda list is created based on importance. The created agenda list and meeting details are emailed to participants, and are also added to their calendars for schedule management.
[0035] On the terminal, a virtual avatar is activated when a meeting begins. This avatar interacts with participants through the terminal's interface and plays a role in ensuring the smooth progress of the meeting. The virtual avatar analyzes participants' comments in real time and provides information and facilitates discussion according to the flow of the discussion. For example, when a user makes a new suggestion, the avatar can search for similar past data and display it on the terminal as reference information.
[0036] Users can ask questions and offer opinions to virtual avatars via their devices during meetings. The avatars analyze the users' comments and use them as material to effectively facilitate the conversation. This interactive format allows for smooth information exchange during meetings and ensures that reference materials are presented in a timely manner.
[0037] After the meeting ends, the server automatically analyzes the recorded audio and text data to create meeting minutes. The generated minutes are emailed to participants and also stored in a cloud storage system for easy access later. This system allows users to reduce the time spent on meeting management and work more efficiently without overlooking important information.
[0038] In this way, this system comprehensively automates all meeting processes by using virtual avatars, providing participants with a more valuable meeting experience.
[0039] The following describes the processing flow.
[0040] Step 1:
[0041] The server checks the meeting schedule and automatically sends an email to participants containing a link to an online form for collecting agenda items and questions.
[0042] Step 2:
[0043] Users access an online form, enter the agenda items or questions they wish to propose, and submit it. Each user's input is sent to the server and stored in a database.
[0044] Step 3:
[0045] The server analyzes the collected agenda items and creates an agenda list based on priority and relevance. This agenda list will be used to guide the meeting.
[0046] Step 4:
[0047] The server sends participants an email containing a generated agenda list, meeting date and time, connection link, and other detailed information. It also automatically adds the meeting to their calendars.
[0048] Step 5:
[0049] At the start of the meeting, a virtual avatar is activated on the terminal to confirm the participant's entry. The avatar begins the meeting with an opening greeting and then proceeds based on the agenda list.
[0050] Step 6:
[0051] Users ask questions and express their opinions to a virtual avatar through their device. The avatar analyzes these inputs, provides relevant information in real time, and controls the flow of the conversation.
[0052] Step 7:
[0053] The terminal records audio data or text logs during the meeting and sends this information to the server. This recording is used later to create meeting minutes.
[0054] Step 8:
[0055] After the meeting ends, the server analyzes the audio recordings and text logs to automatically generate meeting minutes, including each speaker's remarks, decisions made, and action items. These minutes are emailed to participants and stored in cloud storage, making them easily accessible later.
[0056] (Example 1)
[0057] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0058] Conducting modern meetings requires considerable time and effort, from preparation and execution to record-keeping. Furthermore, as the number of participants increases, organizing information and ensuring smooth discussion becomes increasingly difficult. Additionally, traditional methods often make it difficult to retrieve information spoken during a meeting or refer to past examples, leading to decreased meeting efficiency. This invention aims to address these challenges and improve meeting productivity.
[0059] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0060] In this invention, the server includes means for generating a virtual representation that automates the progress of a meeting, means for adding the meeting to the participants' schedules, and means for searching past information using the virtual representation and providing similar information. This makes it possible to facilitate the progress of the meeting and to quickly provide the information necessary during the discussion.
[0061] A "virtual representation" is a digital agent that acts as a substitute for a human in a meeting, supporting the progress of the discussion.
[0062] "Means of collection" refers to the methods and functions for obtaining and organizing information from meeting participants.
[0063] "Means of creation" refers to the methods and functions for compiling a list of topics based on the collected topics.
[0064] "Means of providing" refers to the methods and functions for presenting appropriate information to participants during a meeting.
[0065] "Means of generation" refers to methods or functions for creating records based on information obtained during a meeting.
[0066] "Means of distribution" refers to the methods or functions for sending the generated records to participants.
[0067] "Methods for adding meetings to participants' schedules" refers to methods or functions for automatically registering meetings in participants' schedules.
[0068] "Means of searching and providing similar information" refers to methods or functions for finding past information and indicating information relevant to the current meeting.
[0069] This invention is a system that automates the progress of meetings through a virtual representation. This system combines multiple technologies and enables efficient meeting management.
[0070] The server plays a central role in the system. Once a meeting is scheduled, it creates an online inquiry and sends it to the meeting participants. This inquiry can use a standard online form service. Information submitted by participants is stored in a database. The server uses natural language processing technology to analyze the collected information and creates a list of topics based on their importance. The created list of topics and meeting details are notified to participants via email and are automatically added to their schedules through the scheduling management system.
[0071] The terminal activates a virtual representation at the start of the meeting and manages direct interaction with participants. The virtual representation uses speech recognition technology to receive voice input and analyzes speech during the meeting in real time. This information analysis employs natural language processing technology and database technology for retrieving past information. The virtual representation provides relevant information in response to participants' questions and supports the progress of the meeting.
[0072] Users can directly ask questions and express opinions to virtual representations through their terminals. User input is instantly analyzed by the virtual representation, and appropriate information is provided quickly. This operation enables efficient meeting progress and supports users in obtaining information from their own perspective.
[0073] As a concrete example of its use, when proposing an idea for a new strategic project at a meeting, the user fills in the necessary information in advance on an online inquiry sent from the server. Then, when the proposal is made during the meeting, the virtual representation can instantly search relevant historical data and provide reference information.
[0074] As an example of a prompt, by presenting a question to the generating AI model such as, "Please provide a virtual representation of how to effectively conduct the next strategic meeting," we can obtain detailed information from the AI about the system's capabilities.
[0075] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0076] Step 1:
[0077] The server sends online inquiries to participants after the meeting schedule has been set. It receives the meeting schedule and participant list as input, generates an online inquiry link as output, and sends it via email. Specifically, the server uses an online form service API to create the inquiry.
[0078] Step 2:
[0079] Users input agenda items and questions through online queries provided by the server. As input, users enter their proposed agenda items and questions into an online form and submit it to the server. As output, the user's input data is stored in the server's database.
[0080] Step 3:
[0081] The server analyzes the collected participant information and evaluates the importance of the agenda items. It takes input data from meeting participants as input and analyzes it using natural language processing techniques. As output, it creates a list of topics categorized by importance. The server uses Python's natural language processing library for data processing.
[0082] Step 4:
[0083] The server notifies participants of the created topic list and meeting details, and simultaneously adds the meeting to the calendar using the scheduling management system. It takes the topic list and meeting details as input, and outputs email notifications and calendar additions. The server performs this using the calendar API and email API.
[0084] Step 5:
[0085] On the terminal, a virtual representation is activated at the start of a meeting. It receives a meeting start signal for a specific date and time as input, and the virtual representation's interface is activated as output. The terminal performs this operation using software from the virtual environment.
[0086] Step 6:
[0087] Users make statements and ask questions through a terminal during the meeting. Input is sent to the terminal as voice or text, and output is received as answers or information presented by a virtual representation. The terminal converts the voice to text using a speech recognition API and obtains the answer through a generative AI model.
[0088] Step 7:
[0089] A virtual representation responds to participants' questions, providing relevant information by referencing past data. It accepts natural text queries as input and presents similar information retrieved from a database as output. The terminal implements this using a database search algorithm.
[0090] Step 8:
[0091] After the meeting ends, the server analyzes the audio and text data from the meeting, automatically generates meeting minutes, and distributes them to the participants. It receives recorded audio and text data as input and generates a text file of the meeting minutes as output. The server performs this process using audio analysis and text generation technologies.
[0092] (Application Example 1)
[0093] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0094] Meeting facilitation and information sharing are typically supported by effective communication among participants and efficient decision-making processes. However, running meetings requires considerable time and effort, and meetings related to content creation, in particular, often require quick access to past materials and information. Furthermore, it is not easy to immediately incorporate participants' opinions and facilitate effective discussion. To address these challenges, there is a need for new technological solutions that streamline meetings while simultaneously realizing creative ideas.
[0095] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0096] In this invention, the server includes a generation means for automating the progress of the meeting, a collection means for dynamically collecting agenda items from meeting participants, and a creation means for creating a content list. This makes it possible to efficiently gather participants' opinions during the content creation process and to quickly present relevant materials during the meeting.
[0097] "Dynamic representation" refers to a form of virtual avatar that automates the progress of a meeting and enables interactive dialogue with participants.
[0098] "Means of information gathering" refers to mechanisms for effectively collecting agenda items and information from participants in a meeting.
[0099] A "content list" is a list of items created based on the collected agenda and information, used to organize the discussions during a meeting.
[0100] "Means of delivery" refers to the function of providing information to participants through dialogue during a meeting using dynamic representations.
[0101] "Audio data" refers to audio information recorded during a meeting.
[0102] "Document data" refers to text-based information generated during a meeting.
[0103] A "meeting record" is a document that summarizes the contents of a meeting and functions as a meeting minutes.
[0104] "Distribution method" refers to a method for electronically transmitting the generated record text to participants.
[0105] "Search methods" refer to the process of quickly finding past information and presenting it as relevant material during a meeting.
[0106] "Support measures" refer to functions that support the development of suggestions and ideas during the content creation process.
[0107] The system that implements this application consists of a cloud-based server and user terminals. The server utilizes cloud infrastructure such as Amazon Web Services (AWS®) or Google Cloud Platform, enabling the processing and analysis of large amounts of data. Furthermore, the program is built using Python to facilitate smooth meeting progress. Libraries such as spaCy and Transformers are used for natural language processing, analyzing participants' statements and meeting content in real time.
[0108] The system operates on smart devices and computers used by meeting participants, and dynamic representations facilitate the meeting through interaction with participants. Dynamic representations transcribe spoken audio into text in real time, supporting the meeting's progress while organizing its content. For specific responses and information provision, it employs a method of searching past meeting records and related materials and providing data based on that information.
[0109] As an example of this system, if a content creation team needs materials from a past successful project during a meeting, they can send a prompt message from their terminal to the server saying, "Please display materials related to this new idea." The server can then immediately search for and display the relevant materials. This creates an environment where users can conduct meetings efficiently.
[0110] The AI-generated model automatically searches for and presents relevant information, allowing users to instantly obtain the information they need during meetings. Furthermore, this system is highly scalable, easily integrating new information and historical data.
[0111] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0112] Step 1:
[0113] The server automatically sends an electronic form to participants once the meeting schedule is finalized. It receives meeting schedule information as input and generates an online form. Participants fill in their agenda items on this form, and this information is sent to the server. The server then stores the agenda items submitted by the participants as output.
[0114] Step 2:
[0115] The server stores the received agenda information in a database and analyzes the content list based on importance. It uses the agenda information submitted by participants as input, employing database operations and analysis algorithms. The output is a content list created according to priority.
[0116] Step 3:
[0117] On the terminal, a dynamic expression is activated at the start of the meeting, accepting real-time voice input from participants and converting it to text. The input is the participants' voice statements, and the output is the statements converted into text data. A generative AI model is used to achieve high-quality text conversion.
[0118] Step 4:
[0119] The dynamic representation sends a search request to the server for relevant information based on the prompt text provided by the user. It receives the prompt text as input and performs matching to search for relevant information. The output is a list of the necessary documents and information.
[0120] Step 5:
[0121] The server searches for relevant documents in its historical database and returns them to the terminal. It receives a matched prompt as input and performs data retrieval processing. The output is the corresponding document data. Specifically, it executes a database query.
[0122] Step 6:
[0123] Users develop discussions based on received materials and present new ideas in dynamic representations. They directly utilize materials received from their terminals as input for discussions and proposals. The output consists of improved discussion content and new ideas. Specific interactions take place through virtual avatars.
[0124] Step 7:
[0125] After the meeting ends, the server analyzes the recorded audio data and transcribed discussion to generate a transcript, which is then sent to the participants. It receives conversation data as input and performs text mining and speech analysis. The output is the generated transcript. Specifically, the transcript is saved to cloud storage and distributed via email.
[0126] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0127] This invention provides a system that automates the progress of meetings using virtual avatars and also includes a function to recognize the emotions of participants. The following describes how the server, terminal, and user cooperate to implement this invention.
[0128] The server plays a central role in meeting management, sending an online form to participants once the meeting date is set. Through this form, the server collects agenda items and questions from users and stores them in a database. It analyzes the collected agenda items and creates a priority-based agenda list. This list, along with the meeting details, is emailed to participants, and the event is automatically added to their calendars.
[0129] The terminal activates a virtual avatar at the start of a meeting and begins the meeting after confirming that participants have entered. During the meeting, the terminal acquires emotional data from the user's statements and facial expressions and sends it to the emotion engine. The emotion engine analyzes this data to identify the participants' emotions and provides real-time feedback to the avatar. This allows the virtual avatar to engage in appropriate dialogue according to the user's emotions. For example, if the analysis determines that the user is feeling stressed, the avatar will make relaxing remarks to facilitate the conversation.
[0130] Users can ask questions and express their opinions to virtual avatars via their devices during meetings. Based on analysis results from an emotion engine, the avatar's dialogue in response to user comments is adjusted, providing a better meeting experience for the user.
[0131] After the meeting ends, the server analyzes the audio data and text logs and generates meeting minutes based on the emotional data acquired by the emotion engine. These minutes visualize the emotional shifts that occurred during the meeting and are sent to participants via email. The minutes are also saved to cloud storage for easy access later. This system improves the efficiency of meeting management and enables flexible dialogue that takes participants' emotions into account.
[0132] As described above, the present invention provides a solution for conducting meetings more effectively and in a way that is considerate of participants by combining a virtual avatar and an emotion engine.
[0133] The following describes the processing flow.
[0134] Step 1:
[0135] The server checks the meeting schedule and automatically sends an email to prospective participants containing a link to an online form for collecting agenda items. This form includes fields where users can freely enter topics or questions they wish to propose.
[0136] Step 2:
[0137] Users receive an email, access an online form, enter their agenda items and questions, and submit it. The user's input data is sent to the server and stored in a database.
[0138] Step 3:
[0139] The server analyzes the accumulated agenda data and creates an agenda list based on priority. This list is a crucial element for ensuring a smooth meeting flow.
[0140] Step 4:
[0141] The server generates an agenda list and meeting details, then creates an invitation email and sends it to all participants. Furthermore, the server automatically adds the meeting to the calendar system.
[0142] Step 5:
[0143] At the start of the meeting, the terminal activates a virtual avatar and confirms that all participants have entered the room. The avatar then gives an opening greeting and begins the meeting based on the agenda list.
[0144] Step 6:
[0145] The device collects user speech and facial expression data in real time and sends it to the emotion engine. The emotion engine analyzes this data to identify the user's emotional state.
[0146] Step 7:
[0147] Based on the analysis results of the emotion engine, the virtual avatar on the device adjusts the content of the conversation and provides the user with appropriate feedback and information. For example, if the user shows an emotion of confusion, the avatar will follow up by adding a detailed explanation.
[0148] Step 8:
[0149] As the meeting progresses, audio data and analyzed sentiment data are recorded on the server. Based on this data, the server automatically generates meeting minutes after the meeting concludes.
[0150] Step 9:
[0151] The generated meeting minutes include information visualizing the flow of participants' emotions. These minutes are emailed to participants and also saved in cloud storage. This saving allows users to review the meeting details and related sentiment analysis at a later date.
[0152] (Example 2)
[0153] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0154] In modern business, there is a demand for efficient meetings and smooth communication among participants. However, traditional meeting systems struggle to organize agendas, manage meetings effectively, and facilitate flexible dialogue that takes participants' emotions into account. Furthermore, generating and distributing meeting minutes often involves a lot of manual work, which is burdensome. Therefore, there is a need for a system that automates meeting management and enables dialogue that is sensitive to participants' emotions.
[0155] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0156] In this invention, the server includes data processing means for automating the progress of a meeting, list generation means for creating a priority-based agenda list based on collected agenda items, and emotion analysis means for analyzing participants' facial expressions and voices during the meeting to identify their emotions and provide real-time feedback. This makes it possible to streamline the progress of meetings and realize appropriate dialogue that responds to the emotions of the participants.
[0157] "Data processing means" refers to technical methods for automating the progress of meetings and managing and processing related information.
[0158] "Information gathering methods" refer to methods for efficiently collecting agenda items and questions from meeting participants and storing that information in a database.
[0159] The "list generation method" is a function that analyzes collected agenda data and creates a list that is prioritized based on importance and relevance.
[0160] "Interaction means" refers to a dialogue function in a meeting where a virtual avatar communicates with participants and provides information.
[0161] "Record generation means" refers to technology that analyzes audio and text information generated during a meeting to create detailed meeting minutes.
[0162] The "report distribution method" is a function that quickly distributes the generated meeting minutes to participants and saves them to the cloud as needed.
[0163] "Emotion analysis technology" is a technique that analyzes participants' facial expressions and voices during a meeting to identify their emotions and provides feedback in real time.
[0164] A "dialogue adjustment method" is a method that adjusts the responses of virtual avatars and the content of dialogue based on the emotional state of the participants, thereby achieving more human-like and empathetic communication.
[0165] This invention is a system for efficiently and effectively managing corporate meetings, aiming to automate meeting progress and recognize participants' emotions. This system is primarily implemented by three parties: a server, terminals, and users.
[0166] Server Role
[0167] The server is the central data processing unit for meeting management. When scheduling a meeting, the server sends an online form to participants and uses software to collect agenda items and questions from them. The data obtained through this information gathering method is stored in a database, and a list generation system creates a priority-based agenda list. These agenda lists and meeting details are then emailed to participants, and the meeting is automatically added to their calendar system.
[0168] Terminal role
[0169] The terminal functions as an interface for users to join a meeting. At the start of the meeting, the terminal activates a virtual avatar and manages the meeting while confirming participants' attendance using interaction tools. The terminal is also equipped with a camera and microphone, which capture the user's facial expressions and voice. Emotion analysis tools identify emotions in real time and provide feedback. Based on this, the terminal has a function to adjust the virtual avatar's dialogue according to the participant's emotional state. For example, if the analysis indicates that the user is stressed, the virtual avatar can make relaxing remarks.
[0170] User roles
[0171] Users can ask questions and express their opinions directly to virtual avatars via their devices during meetings. Sentiment analysis-based responses create an optimal meeting experience for each user.
[0172] After the meeting ends, the server uses recording generation capabilities to automatically analyze the meeting's audio and text data, generate detailed meeting minutes incorporating sentiment data, and distribute them to participants using reporting and distribution capabilities, as well as save them to the cloud.
[0173] Examples of specific cases and prompt statements
[0174] A concrete example is a project meeting in a manufacturing company. In this meeting, the project overview and challenges are discussed in the initial stages, and the agenda is dynamically adjusted throughout the meeting based on the feelings and opinions of the team members.
[0175] Example of a prompt:
[0176] "Propose a system that analyzes participants' emotions and provides real-time feedback."
[0177] "Explain how to adjust the content of a conversation based on the results of emotion analysis."
[0178] As described above, this system combines the latest generative AI models with emotion analysis technology to revitalize communication within companies and enable efficient meeting management.
[0179] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0180] Step 1:
[0181] As soon as the meeting date is set, the server creates an online form and sends it to participants. The required inputs are the meeting date, participant contact information, and a form template. The collected participant agenda and question data are then stored in a database.
[0182] Step 2:
[0183] The server retrieves agenda and question data from the database and uses a list generation mechanism to create a priority-based agenda list. The input is the agenda data from the database, and the output is an organized agenda list. This list serves as the basis for facilitating the smooth running of the meeting.
[0184] Step 3:
[0185] The terminal activates a virtual avatar at the start of the meeting. It requires the generated agenda list and participant information as input. Based on this information, the terminal verifies participant entry and displays any necessary initial messages.
[0186] Step 4:
[0187] During the meeting, the device uses its camera and microphone to capture participants' facial expressions and voices. The input consists of real-time video and audio data. This data is sent to an emotion analysis system to identify emotional states. The output is the participants' emotional states, and the avatar's responses are adjusted based on this.
[0188] Step 5:
[0189] The device receives emotion analysis results in real time and adjusts the virtual avatar's dialogue accordingly. For example, if the system determines that the user is confused, the avatar will provide a more detailed and easier-to-understand explanation. The input is emotion state data, and the output is the adjusted dialogue.
[0190] Step 6:
[0191] After the meeting ends, the server analyzes the audio and text data and creates meeting minutes using a record generation system. This process uses speech recognition and natural language processing technologies, with recorded audio data and text logs as input and meeting minutes files as output.
[0192] Step 7:
[0193] The server ultimately distributes the generated meeting minutes to participants and stores them in cloud storage. The input is the completed meeting minutes, and the output is emailing them to participants and storing them in the cloud. This allows participants to access the meeting minutes at any time.
[0194] (Application Example 2)
[0195] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0196] Traditional meeting facilitation methods often resulted in an excessive burden on certain participants despite their full attendance, and made it difficult to engage in dialogue that appropriately considered the feelings and moods of all participants. Furthermore, the process of creating meeting minutes was time-consuming and inefficient. There is a need to address these challenges and provide methods for conducting meetings efficiently and with the participants' needs in mind.
[0197] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0198] In this invention, the server includes a generation means for generating virtual characters to automate the progress of a meeting, an analysis means for acquiring facial expression data of participants during the meeting and analyzing their emotions, and an adjustment means for adjusting the content of the virtual characters' dialogue based on the analyzed emotion data. This enables efficient meeting progress while taking into consideration all participants.
[0199] A "virtual person for automating meeting management" is a digital agent that autonomously handles tasks such as presenting agenda items and creating meeting minutes during a meeting.
[0200] "Acquisition method" refers to a system component equipped with the function of collecting agenda items and opinions from meeting participants.
[0201] A "priority-dependent agenda list" is a list of collected agenda items sorted according to their priority.
[0202] "An adjustment mechanism for adjusting the content of the dialogue" refers to a function that appropriately modifies the statements of virtual characters according to the emotions of the participants and the flow of the conversation.
[0203] The "analysis method" is a mechanism that decodes and analyzes participants' emotional states from their facial expressions and voice data.
[0204] A "record" is a document that describes what was said, what was discussed, what was discussed, what emotions were felt, what actions were taken, etc., during a meeting.
[0205] "Generation methods" refer to a group of functions that generate virtual characters or construct deliverables based on data analyzed during meetings.
[0206] The system of this invention consists of a server, terminals, and users, each working together to improve meeting efficiency and enable dialogue that takes into account the feelings of the participants.
[0207] The server manages the central hub of the meeting. First, once the meeting date is set, the server sends an online form to participants. This form is used to collect agenda items and questions from participants. The collected information is stored in the server's database, and an agenda list is generated based on importance. This list, along with the meeting details, is sent to participants and automatically added to their calendars.
[0208] The terminal activates a virtual character at the start of the meeting. This virtual character begins conducting the meeting after confirming that participants have entered. During the meeting, the terminal collects participants' statements and facial expression data and sends it to the emotion engine. Based on the emotion data analyzed in real time, the virtual character provides appropriate responses according to the participants' emotions, thereby ensuring the smooth running of the meeting. For example, if the virtual character detects that a participant is feeling anxious, it will use reassuring language.
[0209] Users can ask questions and express opinions to virtual characters via their devices. The content of the virtual characters' dialogue is adjusted based on the analysis results from the emotion engine, enabling discussions that are sensitive to changes in participants' opinions and emotions.
[0210] This system uses the Python DeepFace library and OpenCV for emotion analysis. This allows for efficient determination of emotions from image data acquired via camera. Specifically, it can analyze emotions in real time in response to comments made during a meeting, enabling immediate detection of how participants will react when a new topic is introduced.
[0211] By using a generative AI model and considering examples of prompts such as, "What elements are necessary to design a virtual system that can grasp the emotions of meeting participants in real time and engage in corresponding dialogue?", it is possible to achieve more accurate emotion analysis and dialogue functionality.
[0212] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0213] Step 1:
[0214] Once the meeting date is set, the server sends an online form to all participants. This form is used to collect agenda items and questions from participants, using participant information and the meeting date and time as input. The server saves this information to a database, preparing the basic data for the agenda list. The output is the saved data.
[0215] Step 2:
[0216] The server creates a list of agenda items based on their importance, using an algorithm to evaluate and classify them. The input is agenda data stored in a database, and the output is a list of agenda items sorted by importance. This list is distributed to participants via email and automatically added to their calendars.
[0217] Step 3:
[0218] The terminal activates the virtual character at the start of the meeting. Once it confirms that participants have entered the meeting, the virtual character begins conducting the meeting based on the agenda list. Here, participant entry data is used as input, and the virtual character is automatically initialized. The output is the start of the virtual character's activities.
[0219] Step 4:
[0220] The device collects participants' speech and facial expression data during meetings using its camera and microphone, and transmits this data to the emotion engine in real time. The input is camera video and audio data, and the output is emotion data. Facial expression analysis using the DeepFace library is performed as data processing.
[0221] Step 5:
[0222] The server uses the analyzed emotion data to adjust the dialogue of the virtual character. The input is the emotion data from step 4, and an algorithm is applied that appropriately modifies the content of the virtual character's statements. The output is the dialogue of the virtual character, tailored to the participant's emotions.
[0223] Step 6:
[0224] After the meeting ends, the server analyzes the audio and text data and generates meeting minutes based on information from the emotion engine. At this stage, all data from the meeting is used as input, and natural language processing techniques are employed for analysis. The generated meeting minutes are then distributed to the participants as output.
[0225] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0226] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0227] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0228] [Second Embodiment]
[0229] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0230] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0231] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0232] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0233] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0234] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0235] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0236] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0237] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0238] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0239] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0240] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0241] This invention is an automated meeting system using virtual avatars, and its operation is realized by multiple devices. The following describes how the server, terminal, and user cooperate to implement this system.
[0242] The server acts as the central hub of the system, automatically sending online forms to participants once a meeting is scheduled. Through these forms, it collects agenda items and questions from participants, organizing and storing them in a database. The collected information is analyzed by the server, and an agenda list is created based on importance. The created agenda list and meeting details are emailed to participants, and are also added to their calendars for schedule management.
[0243] On the terminal, a virtual avatar is activated when a meeting begins. This avatar interacts with participants through the terminal's interface and plays a role in ensuring the smooth progress of the meeting. The virtual avatar analyzes participants' comments in real time and provides information and facilitates discussion according to the flow of the discussion. For example, when a user makes a new suggestion, the avatar can search for similar past data and display it on the terminal as reference information.
[0244] Users can ask questions and offer opinions to virtual avatars via their devices during meetings. The avatars analyze the users' comments and use them as material to effectively facilitate the conversation. This interactive format allows for smooth information exchange during meetings and ensures that reference materials are presented in a timely manner.
[0245] After the meeting ends, the server automatically analyzes the recorded audio and text data to create meeting minutes. The generated minutes are emailed to participants and also stored in a cloud storage system for easy access later. This system allows users to reduce the time spent on meeting management and work more efficiently without overlooking important information.
[0246] In this way, this system comprehensively automates all meeting processes by using virtual avatars, providing participants with a more valuable meeting experience.
[0247] The following describes the processing flow.
[0248] Step 1:
[0249] The server checks the meeting schedule and automatically sends an email to participants containing a link to an online form for collecting agenda items and questions.
[0250] Step 2:
[0251] Users access an online form, enter the agenda items or questions they wish to propose, and submit it. Each user's input is sent to the server and stored in a database.
[0252] Step 3:
[0253] The server analyzes the collected agenda items and creates an agenda list based on priority and relevance. This agenda list will be used to guide the meeting.
[0254] Step 4:
[0255] The server sends participants an email containing a generated agenda list, meeting date and time, connection link, and other detailed information. It also automatically adds the meeting to their calendars.
[0256] Step 5:
[0257] At the start of the meeting, a virtual avatar is activated on the terminal to confirm the participant's entry. The avatar begins the meeting with an opening greeting and then proceeds based on the agenda list.
[0258] Step 6:
[0259] Users ask questions and express their opinions to a virtual avatar through their device. The avatar analyzes these inputs, provides relevant information in real time, and controls the flow of the conversation.
[0260] Step 7:
[0261] The terminal records audio data or text logs during the meeting and sends this information to the server. This recording is used later to create meeting minutes.
[0262] Step 8:
[0263] After the meeting ends, the server analyzes the audio recordings and text logs to automatically generate meeting minutes, including each speaker's remarks, decisions made, and action items. These minutes are emailed to participants and stored in cloud storage, making them easily accessible later.
[0264] (Example 1)
[0265] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0266] Conducting modern meetings requires considerable time and effort, from preparation and execution to record-keeping. Furthermore, as the number of participants increases, organizing information and ensuring smooth discussion becomes increasingly difficult. Additionally, traditional methods often make it difficult to retrieve information spoken during a meeting or refer to past examples, leading to decreased meeting efficiency. This invention aims to address these challenges and improve meeting productivity.
[0267] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0268] In this invention, the server includes means for generating a virtual representation that automates the progress of a meeting, means for adding the meeting to the participants' schedules, and means for searching past information using the virtual representation and providing similar information. This makes it possible to facilitate the progress of the meeting and to quickly provide the information necessary during the discussion.
[0269] A "virtual representation" is a digital agent that acts as a substitute for a human in a meeting, supporting the progress of the discussion.
[0270] "Means of collection" refers to the methods and functions for obtaining and organizing information from meeting participants.
[0271] "Means of creation" refers to the methods and functions for compiling a list of topics based on the collected topics.
[0272] "Means of providing" refers to the methods and functions for presenting appropriate information to participants during a meeting.
[0273] "Means of generation" refers to methods or functions for creating records based on information obtained during a meeting.
[0274] "Means of distribution" refers to the methods or functions for sending the generated records to participants.
[0275] "Methods for adding meetings to participants' schedules" refers to methods or functions for automatically registering meetings in participants' schedules.
[0276] "Means of searching and providing similar information" refers to methods or functions for finding past information and indicating information relevant to the current meeting.
[0277] This invention is a system that automates the progress of meetings through a virtual representation. This system combines multiple technologies and enables efficient meeting management.
[0278] The server plays a central role in the system. Once a meeting is scheduled, it creates an online inquiry and sends it to the meeting participants. This inquiry can use a standard online form service. Information submitted by participants is stored in a database. The server uses natural language processing technology to analyze the collected information and creates a list of topics based on their importance. The created list of topics and meeting details are notified to participants via email and are automatically added to their schedules through the scheduling management system.
[0279] The terminal activates a virtual representation at the start of the meeting and manages direct interaction with participants. The virtual representation uses speech recognition technology to receive voice input and analyzes speech during the meeting in real time. This information analysis employs natural language processing technology and database technology for retrieving past information. The virtual representation provides relevant information in response to participants' questions and supports the progress of the meeting.
[0280] Users can directly ask questions and express opinions to virtual representations through their terminals. User input is instantly analyzed by the virtual representation, and appropriate information is provided quickly. This operation enables efficient meeting progress and supports users in obtaining information from their own perspective.
[0281] As a concrete example of its use, when proposing an idea for a new strategic project at a meeting, the user fills in the necessary information in advance on an online inquiry sent from the server. Then, when the proposal is made during the meeting, the virtual representation can instantly search relevant historical data and provide reference information.
[0282] As an example of a prompt, by presenting a question to the generating AI model such as, "Please provide a virtual representation of how to effectively conduct the next strategic meeting," we can obtain detailed information from the AI about the system's capabilities.
[0283] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0284] Step 1:
[0285] After the meeting schedule is set, the server sends an online inquiry to the participants. It receives the meeting schedule and the participant list as input, generates an online inquiry link as output, and sends it via email. As a specific operation, the server creates the inquiry using the online form service API.
[0286] Step 2:
[0287] The user inputs topics and questions through the online inquiry provided by the server. As input, the user enters the topics and questions they consider into the online form and sends them to the server. As output, the user's input data is saved in the server's database.
[0288] Step 3:
[0289] The server analyzes the collected participant information and evaluates the importance of the topics. As input, it obtains the input data of the meeting participants and analyzes it using natural language processing technology. As output, it creates a list of topics according to the importance. The server performs data processing using the natural language processing library of Python.
[0290] Step 4:
[0291] The server notifies the participants of the created topic list and meeting details, and at the same time adds the meeting to the calendar using the schedule management system. As input, it obtains the topic list and meeting details, and as output, it sends an email notification and adds it to the calendar. The server executes this using the calendar API and the email API.
[0292] Step 5:
[0293] On the terminal, a virtual representation is activated at the start of a meeting. It receives a meeting start signal for a specific date and time as input, and the virtual representation's interface is activated as output. The terminal performs this operation using software from the virtual environment.
[0294] Step 6:
[0295] Users make statements and ask questions through a terminal during the meeting. Input is sent to the terminal as voice or text, and output is received as answers or information presented by a virtual representation. The terminal converts the voice to text using a speech recognition API and obtains the answer through a generative AI model.
[0296] Step 7:
[0297] A virtual representation responds to participants' questions, providing relevant information by referencing past data. It accepts natural text queries as input and presents similar information retrieved from a database as output. The terminal implements this using a database search algorithm.
[0298] Step 8:
[0299] After the meeting ends, the server analyzes the audio and text data from the meeting, automatically generates meeting minutes, and distributes them to the participants. It receives recorded audio and text data as input and generates a text file of the meeting minutes as output. The server performs this process using audio analysis and text generation technologies.
[0300] (Application Example 1)
[0301] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0302] The progress of meetings and information sharing are usually supported by appropriate communication among participants and an efficient decision-making process. However, running a meeting requires a lot of time and effort. Especially in meetings related to content production, there may be a need to quickly refer to past materials and information. Also, it is not easy to immediately reflect the opinions of participants and conduct effective discussions. In response to such situations, new technical solutions are needed to streamline meetings while materializing creative ideas.
[0303] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following respective means.
[0304] In this invention, the server includes a generation means for automating the progress of a meeting, a collection means for a dynamic expression to collect topics from meeting participants, and a creation means for creating a content list. Thereby, it becomes possible to efficiently aggregate the opinions of participants in the content production process and quickly present relevant materials during the meeting.
[0305] "Dynamic expression" is a form of virtual avatar that automates the progress of a meeting and enables interactive dialogue with participants.
[0306] "Collection means" is a mechanism for effectively gathering topics and information from participants in a meeting.
[0307] "Content list" is a list of items for organizing discussions in a meeting, created based on the collected topics and information.
[0308] "Provision means" is a function for a dynamic expression to provide information to participants through dialogue during a meeting.
[0309] "Acoustic data" is information in the form of sound recorded during a meeting.
[0310] "Document data" is information in text format generated during a meeting.
[0311] A "meeting record" is a document that summarizes the contents of a meeting and functions as a meeting minutes.
[0312] "Distribution method" refers to a method for electronically transmitting the generated record text to participants.
[0313] "Search methods" refer to the process of quickly finding past information and presenting it as relevant material during a meeting.
[0314] "Support measures" refer to functions that support the development of suggestions and ideas during the content creation process.
[0315] The system that implements this application consists of a cloud-based server and user terminals. The server utilizes cloud infrastructure such as Amazon Web Services (AWS) or Google Cloud Platform, enabling the processing and analysis of large amounts of data. Furthermore, the program is built using Python to facilitate smooth meeting progress. Libraries such as spaCy and Transformers are used for natural language processing, analyzing participants' statements and meeting content in real time.
[0316] The system operates on smart devices and computers used by meeting participants, and dynamic representations facilitate the meeting through interaction with participants. Dynamic representations transcribe spoken audio into text in real time, supporting the meeting's progress while organizing its content. For specific responses and information provision, it employs a method of searching past meeting records and related materials and providing data based on that information.
[0317] As an example of this system, if a content creation team needs materials from a past successful project during a meeting, they can send a prompt message from their terminal to the server saying, "Please display materials related to this new idea." The server can then immediately search for and display the relevant materials. This creates an environment where users can conduct meetings efficiently.
[0318] The AI-generated model automatically searches for and presents relevant information, allowing users to instantly obtain the information they need during meetings. Furthermore, this system is highly scalable, easily integrating new information and historical data.
[0319] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0320] Step 1:
[0321] The server automatically sends an electronic form to participants once the meeting schedule is finalized. It receives meeting schedule information as input and generates an online form. Participants fill in their agenda items on this form, and this information is sent to the server. The server then stores the agenda items submitted by the participants as output.
[0322] Step 2:
[0323] The server stores the received agenda information in a database and analyzes the content list based on importance. It uses the agenda information submitted by participants as input, employing database operations and analysis algorithms. The output is a content list created according to priority.
[0324] Step 3:
[0325] On the terminal, a dynamic expression is activated at the start of the meeting, accepting real-time voice input from participants and converting it to text. The input is the participants' voice statements, and the output is the statements converted into text data. A generative AI model is used to achieve high-quality text conversion.
[0326] Step 4:
[0327] The dynamic representation sends a search request to the server for relevant information based on the prompt text provided by the user. It receives the prompt text as input and performs matching to search for relevant information. The output is a list of the necessary documents and information.
[0328] Step 5:
[0329] The server searches for relevant documents in its historical database and returns them to the terminal. It receives a matched prompt as input and performs data retrieval processing. The output is the corresponding document data. Specifically, it executes a database query.
[0330] Step 6:
[0331] Users develop discussions based on received materials and present new ideas in dynamic representations. They directly utilize materials received from their terminals as input for discussions and proposals. The output consists of improved discussion content and new ideas. Specific interactions take place through virtual avatars.
[0332] Step 7:
[0333] After the meeting ends, the server analyzes the recorded audio data and transcribed discussion to generate a transcript, which is then sent to the participants. It receives conversation data as input and performs text mining and speech analysis. The output is the generated transcript. Specifically, the transcript is saved to cloud storage and distributed via email.
[0334] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0335] This invention provides a system that automates the progress of meetings using virtual avatars and also includes a function to recognize the emotions of participants. The following describes how the server, terminal, and user cooperate to implement this invention.
[0336] The server plays a central role in meeting management, sending an online form to participants once the meeting date is set. Through this form, the server collects agenda items and questions from users and stores them in a database. It analyzes the collected agenda items and creates a priority-based agenda list. This list, along with the meeting details, is emailed to participants, and the event is automatically added to their calendars.
[0337] The terminal activates a virtual avatar at the start of a meeting and begins the meeting after confirming that participants have entered. During the meeting, the terminal acquires emotional data from the user's statements and facial expressions and sends it to the emotion engine. The emotion engine analyzes this data to identify the participants' emotions and provides real-time feedback to the avatar. This allows the virtual avatar to engage in appropriate dialogue according to the user's emotions. For example, if the analysis determines that the user is feeling stressed, the avatar will make relaxing remarks to facilitate the conversation.
[0338] Users can ask questions and express their opinions to virtual avatars via their devices during meetings. Based on analysis results from an emotion engine, the avatar's dialogue in response to user comments is adjusted, providing a better meeting experience for the user.
[0339] After the meeting ends, the server analyzes the audio data and text logs and generates meeting minutes based on the emotional data acquired by the emotion engine. These minutes visualize the emotional shifts that occurred during the meeting and are sent to participants via email. The minutes are also saved to cloud storage for easy access later. This system improves the efficiency of meeting management and enables flexible dialogue that takes participants' emotions into account.
[0340] As described above, the present invention provides a solution for conducting meetings more effectively and in a way that is considerate of participants by combining a virtual avatar and an emotion engine.
[0341] The following describes the processing flow.
[0342] Step 1:
[0343] The server checks the meeting schedule and automatically sends an email to prospective participants containing a link to an online form for collecting agenda items. This form includes fields where users can freely enter topics or questions they wish to propose.
[0344] Step 2:
[0345] Users receive an email, access an online form, enter their agenda items and questions, and submit it. The user's input data is sent to the server and stored in a database.
[0346] Step 3:
[0347] The server analyzes the accumulated agenda data and creates an agenda list based on priority. This list is a crucial element for ensuring a smooth meeting flow.
[0348] Step 4:
[0349] The server generates an agenda list and meeting details, then creates an invitation email and sends it to all participants. Furthermore, the server automatically adds the meeting to the calendar system.
[0350] Step 5:
[0351] At the start of the meeting, the terminal activates a virtual avatar and confirms that all participants have entered the room. The avatar then gives an opening greeting and begins the meeting based on the agenda list.
[0352] Step 6:
[0353] The device collects user speech and facial expression data in real time and sends it to the emotion engine. The emotion engine analyzes this data to identify the user's emotional state.
[0354] Step 7:
[0355] Based on the analysis results of the emotion engine, the virtual avatar on the device adjusts the content of the conversation and provides the user with appropriate feedback and information. For example, if the user shows an emotion of confusion, the avatar will follow up by adding a detailed explanation.
[0356] Step 8:
[0357] As the meeting progresses, audio data and analyzed sentiment data are recorded on the server. Based on this data, the server automatically generates meeting minutes after the meeting concludes.
[0358] Step 9:
[0359] The generated meeting minutes include information visualizing the flow of participants' emotions. These minutes are emailed to participants and also saved in cloud storage. This saving allows users to review the meeting details and related sentiment analysis at a later date.
[0360] (Example 2)
[0361] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0362] In modern business, there is a demand for efficient meetings and smooth communication among participants. However, traditional meeting systems struggle to organize agendas, manage meetings effectively, and facilitate flexible dialogue that takes participants' emotions into account. Furthermore, generating and distributing meeting minutes often involves a lot of manual work, which is burdensome. Therefore, there is a need for a system that automates meeting management and enables dialogue that is sensitive to participants' emotions.
[0363] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0364] In this invention, the server includes data processing means for automating the progress of a meeting, list generation means for creating a priority-based agenda list based on collected agenda items, and emotion analysis means for analyzing participants' facial expressions and voices during the meeting to identify their emotions and provide real-time feedback. This makes it possible to streamline the progress of meetings and realize appropriate dialogue that responds to the emotions of the participants.
[0365] "Data processing means" refers to technical methods for automating the progress of meetings and managing and processing related information.
[0366] "Information gathering methods" refer to methods for efficiently collecting agenda items and questions from meeting participants and storing that information in a database.
[0367] The "list generation method" is a function that analyzes collected agenda data and creates a list that is prioritized based on importance and relevance.
[0368] "Interaction means" refers to a dialogue function in a meeting where a virtual avatar communicates with participants and provides information.
[0369] "Record generation means" refers to technology that analyzes audio and text information generated during a meeting to create detailed meeting minutes.
[0370] The "report distribution method" is a function that quickly distributes the generated meeting minutes to participants and saves them to the cloud as needed.
[0371] "Emotion analysis technology" is a technique that analyzes participants' facial expressions and voices during a meeting to identify their emotions and provides feedback in real time.
[0372] A "dialogue adjustment method" is a method that adjusts the responses of virtual avatars and the content of dialogue based on the emotional state of the participants, thereby achieving more human-like and empathetic communication.
[0373] This invention is a system for efficiently and effectively managing corporate meetings, aiming to automate meeting progress and recognize participants' emotions. This system is primarily implemented by three parties: a server, terminals, and users.
[0374] Server Role
[0375] The server is the central data processing unit for meeting management. When scheduling a meeting, the server sends an online form to participants and uses software to collect agenda items and questions from them. The data obtained through this information gathering method is stored in a database, and a list generation system creates a priority-based agenda list. These agenda lists and meeting details are then emailed to participants, and the meeting is automatically added to their calendar system.
[0376] Terminal role
[0377] The terminal functions as an interface for users to join a meeting. At the start of the meeting, the terminal activates a virtual avatar and manages the meeting while confirming participants' attendance using interaction tools. The terminal is also equipped with a camera and microphone, which capture the user's facial expressions and voice. Emotion analysis tools identify emotions in real time and provide feedback. Based on this, the terminal has a function to adjust the virtual avatar's dialogue according to the participant's emotional state. For example, if the analysis indicates that the user is stressed, the virtual avatar can make relaxing remarks.
[0378] User roles
[0379] Users can ask questions and express their opinions directly to virtual avatars via their devices during meetings. Sentiment analysis-based responses create an optimal meeting experience for each user.
[0380] After the meeting ends, the server uses recording generation capabilities to automatically analyze the meeting's audio and text data, generate detailed meeting minutes incorporating sentiment data, and distribute them to participants using reporting and distribution capabilities, as well as save them to the cloud.
[0381] Examples of specific cases and prompt statements
[0382] A concrete example is a project meeting in a manufacturing company. In this meeting, the project overview and challenges are discussed in the initial stages, and the agenda is dynamically adjusted throughout the meeting based on the feelings and opinions of the team members.
[0383] Example of a prompt:
[0384] "Propose a system that analyzes participants' emotions and provides real-time feedback."
[0385] "Explain how to adjust the content of a conversation based on the results of emotion analysis."
[0386] As described above, this system combines the latest generative AI models with emotion analysis technology to revitalize communication within companies and enable efficient meeting management.
[0387] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0388] Step 1:
[0389] As soon as the meeting date is set, the server creates an online form and sends it to participants. The required inputs are the meeting date, participant contact information, and a form template. The collected participant agenda and question data are then stored in a database.
[0390] Step 2:
[0391] The server retrieves agenda and question data from the database and uses a list generation mechanism to create a priority-based agenda list. The input is the agenda data from the database, and the output is an organized agenda list. This list serves as the basis for facilitating the smooth running of the meeting.
[0392] Step 3:
[0393] The terminal activates a virtual avatar at the start of the meeting. It requires the generated agenda list and participant information as input. Based on this information, the terminal verifies participant entry and displays any necessary initial messages.
[0394] Step 4:
[0395] During the meeting, the device uses its camera and microphone to capture participants' facial expressions and voices. The input consists of real-time video and audio data. This data is sent to an emotion analysis system to identify emotional states. The output is the participants' emotional states, and the avatar's responses are adjusted based on this.
[0396] Step 5:
[0397] The device receives emotion analysis results in real time and adjusts the virtual avatar's dialogue accordingly. For example, if the system determines that the user is confused, the avatar will provide a more detailed and easier-to-understand explanation. The input is emotion state data, and the output is the adjusted dialogue.
[0398] Step 6:
[0399] After the meeting ends, the server analyzes the audio and text data and creates meeting minutes using a record generation system. This process uses speech recognition and natural language processing technologies, with recorded audio data and text logs as input and meeting minutes files as output.
[0400] Step 7:
[0401] The server ultimately distributes the generated meeting minutes to participants and stores them in cloud storage. The input is the completed meeting minutes, and the output is emailing them to participants and storing them in the cloud. This allows participants to access the meeting minutes at any time.
[0402] (Application Example 2)
[0403] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0404] Traditional meeting facilitation methods often resulted in an excessive burden on certain participants despite their full attendance, and made it difficult to engage in dialogue that appropriately considered the feelings and moods of all participants. Furthermore, the process of creating meeting minutes was time-consuming and inefficient. There is a need to address these challenges and provide methods for conducting meetings efficiently and with the participants' needs in mind.
[0405] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0406] In this invention, the server includes a generation means for generating virtual characters to automate the progress of a meeting, an analysis means for acquiring facial expression data of participants during the meeting and analyzing their emotions, and an adjustment means for adjusting the content of the virtual characters' dialogue based on the analyzed emotion data. This enables efficient meeting progress while taking into consideration all participants.
[0407] A "virtual person for automating meeting management" is a digital agent that autonomously handles tasks such as presenting agenda items and creating meeting minutes during a meeting.
[0408] "Acquisition method" refers to a system component equipped with the function of collecting agenda items and opinions from meeting participants.
[0409] A "priority-dependent agenda list" is a list of collected agenda items sorted according to their priority.
[0410] "An adjustment mechanism for adjusting the content of the dialogue" refers to a function that appropriately modifies the statements of virtual characters according to the emotions of the participants and the flow of the conversation.
[0411] The "analysis method" is a mechanism that decodes and analyzes participants' emotional states from their facial expressions and voice data.
[0412] A "record" is a document that describes what was said, what was discussed, what was discussed, what emotions were felt, what actions were taken, etc., during a meeting.
[0413] "Generation methods" refer to a group of functions that generate virtual characters or construct deliverables based on data analyzed during meetings.
[0414] The system of this invention consists of a server, terminals, and users, each working together to improve meeting efficiency and enable dialogue that takes into account the feelings of the participants.
[0415] The server manages the central hub of the meeting. First, once the meeting date is set, the server sends an online form to participants. This form is used to collect agenda items and questions from participants. The collected information is stored in the server's database, and an agenda list is generated based on importance. This list, along with the meeting details, is sent to participants and automatically added to their calendars.
[0416] The terminal activates a virtual character at the start of the meeting. This virtual character begins conducting the meeting after confirming that participants have entered. During the meeting, the terminal collects participants' statements and facial expression data and sends it to the emotion engine. Based on the emotion data analyzed in real time, the virtual character provides appropriate responses according to the participants' emotions, thereby ensuring the smooth running of the meeting. For example, if the virtual character detects that a participant is feeling anxious, it will use reassuring language.
[0417] Users can ask questions and express opinions to virtual characters via their devices. The content of the virtual characters' dialogue is adjusted based on the analysis results from the emotion engine, enabling discussions that are sensitive to changes in participants' opinions and emotions.
[0418] This system uses the Python DeepFace library and OpenCV for emotion analysis. This allows for efficient determination of emotions from image data acquired via camera. Specifically, it can analyze emotions in real time in response to comments made during a meeting, enabling immediate detection of how participants will react when a new topic is introduced.
[0419] By using a generative AI model and considering examples of prompts such as, "What elements are necessary to design a virtual system that can grasp the emotions of meeting participants in real time and engage in corresponding dialogue?", it is possible to achieve more accurate emotion analysis and dialogue functionality.
[0420] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0421] Step 1:
[0422] Once the meeting date is set, the server sends an online form to all participants. This form is used to collect agenda items and questions from participants, using participant information and the meeting date and time as input. The server saves this information to a database, preparing the basic data for the agenda list. The output is the saved data.
[0423] Step 2:
[0424] The server creates a list of agenda items based on their importance, using an algorithm to evaluate and classify them. The input is agenda data stored in a database, and the output is a list of agenda items sorted by importance. This list is distributed to participants via email and automatically added to their calendars.
[0425] Step 3:
[0426] The terminal activates the virtual character at the start of the meeting. Once it confirms that participants have entered the meeting, the virtual character begins conducting the meeting based on the agenda list. Here, participant entry data is used as input, and the virtual character is automatically initialized. The output is the start of the virtual character's activities.
[0427] Step 4:
[0428] The device collects participants' speech and facial expression data during meetings using its camera and microphone, and transmits this data to the emotion engine in real time. The input is camera video and audio data, and the output is emotion data. Facial expression analysis using the DeepFace library is performed as data processing.
[0429] Step 5:
[0430] The server uses the analyzed emotion data to adjust the dialogue of the virtual character. The input is the emotion data from step 4, and an algorithm is applied that appropriately modifies the content of the virtual character's statements. The output is the dialogue of the virtual character, tailored to the participant's emotions.
[0431] Step 6:
[0432] After the meeting ends, the server analyzes the audio and text data and generates meeting minutes based on information from the emotion engine. At this stage, all data from the meeting is used as input, and natural language processing techniques are employed for analysis. The generated meeting minutes are then distributed to the participants as output.
[0433] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0434] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0435] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0436] [Third Embodiment]
[0437] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0438] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0439] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0440] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0441] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0442] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0443] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0444] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0445] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0446] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0447] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0448] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0449] This invention is an automated meeting system using virtual avatars, and its operation is realized by multiple devices. The following describes how the server, terminal, and user cooperate to implement this system.
[0450] The server acts as the central hub of the system, automatically sending online forms to participants once a meeting is scheduled. Through these forms, it collects agenda items and questions from participants, organizing and storing them in a database. The collected information is analyzed by the server, and an agenda list is created based on importance. The created agenda list and meeting details are emailed to participants, and are also added to their calendars for schedule management.
[0451] On the terminal, a virtual avatar is activated when a meeting begins. This avatar interacts with participants through the terminal's interface and plays a role in ensuring the smooth progress of the meeting. The virtual avatar analyzes participants' comments in real time and provides information and facilitates discussion according to the flow of the discussion. For example, when a user makes a new suggestion, the avatar can search for similar past data and display it on the terminal as reference information.
[0452] Users can ask questions and offer opinions to virtual avatars via their devices during meetings. The avatars analyze the users' comments and use them as material to effectively facilitate the conversation. This interactive format allows for smooth information exchange during meetings and ensures that reference materials are presented in a timely manner.
[0453] After the meeting ends, the server automatically analyzes the recorded audio and text data to create meeting minutes. The generated minutes are emailed to participants and also stored in a cloud storage system for easy access later. This system allows users to reduce the time spent on meeting management and work more efficiently without overlooking important information.
[0454] In this way, this system comprehensively automates all meeting processes by using virtual avatars, providing participants with a more valuable meeting experience.
[0455] The following describes the processing flow.
[0456] Step 1:
[0457] The server checks the meeting schedule and automatically sends an email to participants containing a link to an online form for collecting agenda items and questions.
[0458] Step 2:
[0459] Users access an online form, enter the agenda items or questions they wish to propose, and submit it. Each user's input is sent to the server and stored in a database.
[0460] Step 3:
[0461] The server analyzes the collected agenda items and creates an agenda list based on priority and relevance. This agenda list will be used to guide the meeting.
[0462] Step 4:
[0463] The server sends participants an email containing a generated agenda list, meeting date and time, connection link, and other detailed information. It also automatically adds the meeting to their calendars.
[0464] Step 5:
[0465] At the start of the meeting, a virtual avatar is activated on the terminal to confirm the participant's entry. The avatar begins the meeting with an opening greeting and then proceeds based on the agenda list.
[0466] Step 6:
[0467] Users ask questions and express their opinions to a virtual avatar through their device. The avatar analyzes these inputs, provides relevant information in real time, and controls the flow of the conversation.
[0468] Step 7:
[0469] The terminal records audio data or text logs during the meeting and sends this information to the server. This recording is used later to create meeting minutes.
[0470] Step 8:
[0471] After the meeting ends, the server analyzes the audio recordings and text logs to automatically generate meeting minutes, including each speaker's remarks, decisions made, and action items. These minutes are emailed to participants and stored in cloud storage, making them easily accessible later.
[0472] (Example 1)
[0473] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0474] Conducting modern meetings requires considerable time and effort, from preparation and execution to record-keeping. Furthermore, as the number of participants increases, organizing information and ensuring smooth discussion becomes increasingly difficult. Additionally, traditional methods often make it difficult to retrieve information spoken during a meeting or refer to past examples, leading to decreased meeting efficiency. This invention aims to address these challenges and improve meeting productivity.
[0475] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0476] In this invention, the server includes means for generating a virtual representation that automates the progress of a meeting, means for adding the meeting to the participants' schedules, and means for searching past information using the virtual representation and providing similar information. This makes it possible to facilitate the progress of the meeting and to quickly provide the information necessary during the discussion.
[0477] A "virtual representation" is a digital agent that acts as a substitute for a human in a meeting, supporting the progress of the discussion.
[0478] "Means of collection" refers to the methods and functions for obtaining and organizing information from meeting participants.
[0479] "Means of creation" refers to the methods and functions for compiling a list of topics based on the collected topics.
[0480] "Means of providing" refers to the methods and functions for presenting appropriate information to participants during a meeting.
[0481] "Means of generation" refers to methods or functions for creating records based on information obtained during a meeting.
[0482] "Means of distribution" refers to the methods or functions for sending the generated records to participants.
[0483] "Methods for adding meetings to participants' schedules" refers to methods or functions for automatically registering meetings in participants' schedules.
[0484] "Means of searching and providing similar information" refers to methods or functions for finding past information and indicating information relevant to the current meeting.
[0485] This invention is a system that automates the progress of meetings through a virtual representation. This system combines multiple technologies and enables efficient meeting management.
[0486] The server plays a central role in the system. Once a meeting is scheduled, it creates an online inquiry and sends it to the meeting participants. This inquiry can use a standard online form service. Information submitted by participants is stored in a database. The server uses natural language processing technology to analyze the collected information and creates a list of topics based on their importance. The created list of topics and meeting details are notified to participants via email and are automatically added to their schedules through the scheduling management system.
[0487] The terminal activates a virtual representation at the start of the meeting and manages direct interaction with participants. The virtual representation uses speech recognition technology to receive voice input and analyzes speech during the meeting in real time. This information analysis employs natural language processing technology and database technology for retrieving past information. The virtual representation provides relevant information in response to participants' questions and supports the progress of the meeting.
[0488] Users can directly ask questions and express opinions to virtual representations through their terminals. User input is instantly analyzed by the virtual representation, and appropriate information is provided quickly. This operation enables efficient meeting progress and supports users in obtaining information from their own perspective.
[0489] As a concrete example of its use, when proposing an idea for a new strategic project at a meeting, the user fills in the necessary information in advance on an online inquiry sent from the server. Then, when the proposal is made during the meeting, the virtual representation can instantly search relevant historical data and provide reference information.
[0490] As an example of a prompt, by presenting a question to the generating AI model such as, "Please provide a virtual representation of how to effectively conduct the next strategic meeting," we can obtain detailed information from the AI about the system's capabilities.
[0491] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0492] Step 1:
[0493] The server sends online inquiries to participants after the meeting schedule has been set. It receives the meeting schedule and participant list as input, generates an online inquiry link as output, and sends it via email. Specifically, the server uses an online form service API to create the inquiry.
[0494] Step 2:
[0495] Users input agenda items and questions through online queries provided by the server. As input, users enter their proposed agenda items and questions into an online form and submit it to the server. As output, the user's input data is stored in the server's database.
[0496] Step 3:
[0497] The server analyzes the collected participant information and evaluates the importance of the agenda items. It takes input data from meeting participants as input and analyzes it using natural language processing techniques. As output, it creates a list of topics categorized by importance. The server uses Python's natural language processing library for data processing.
[0498] Step 4:
[0499] The server notifies participants of the created topic list and meeting details, and simultaneously adds the meeting to the calendar using the scheduling management system. It takes the topic list and meeting details as input, and outputs email notifications and calendar additions. The server performs this using the calendar API and email API.
[0500] Step 5:
[0501] On the terminal, a virtual representation is activated at the start of a meeting. It receives a meeting start signal for a specific date and time as input, and the virtual representation's interface is activated as output. The terminal performs this operation using software from the virtual environment.
[0502] Step 6:
[0503] Users make statements and ask questions through a terminal during the meeting. Input is sent to the terminal as voice or text, and output is received as answers or information presented by a virtual representation. The terminal converts the voice to text using a speech recognition API and obtains the answer through a generative AI model.
[0504] Step 7:
[0505] A virtual representation responds to participants' questions, providing relevant information by referencing past data. It accepts natural text queries as input and presents similar information retrieved from a database as output. The terminal implements this using a database search algorithm.
[0506] Step 8:
[0507] After the meeting ends, the server analyzes the audio and text data from the meeting, automatically generates meeting minutes, and distributes them to the participants. It receives recorded audio and text data as input and generates a text file of the meeting minutes as output. The server performs this process using audio analysis and text generation technologies.
[0508] (Application Example 1)
[0509] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0510] Meeting facilitation and information sharing are typically supported by effective communication among participants and efficient decision-making processes. However, running meetings requires considerable time and effort, and meetings related to content creation, in particular, often require quick access to past materials and information. Furthermore, it is not easy to immediately incorporate participants' opinions and facilitate effective discussion. To address these challenges, there is a need for new technological solutions that streamline meetings while simultaneously realizing creative ideas.
[0511] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0512] In this invention, the server includes a generation means for automating the progress of the meeting, a collection means for dynamically collecting agenda items from meeting participants, and a creation means for creating a content list. This makes it possible to efficiently gather participants' opinions during the content creation process and to quickly present relevant materials during the meeting.
[0513] "Dynamic representation" refers to a form of virtual avatar that automates the progress of a meeting and enables interactive dialogue with participants.
[0514] "Means of information gathering" refers to mechanisms for effectively collecting agenda items and information from participants in a meeting.
[0515] A "content list" is a list of items created based on the collected agenda and information, used to organize the discussions during a meeting.
[0516] "Means of delivery" refers to the function of providing information to participants through dialogue during a meeting using dynamic representations.
[0517] "Audio data" refers to audio information recorded during a meeting.
[0518] "Document data" refers to text-based information generated during a meeting.
[0519] A "meeting record" is a document that summarizes the contents of a meeting and functions as a meeting minutes.
[0520] "Distribution method" refers to a method for electronically transmitting the generated record text to participants.
[0521] "Search methods" refer to the process of quickly finding past information and presenting it as relevant material during a meeting.
[0522] "Support measures" refer to functions that support the development of suggestions and ideas during the content creation process.
[0523] The system that implements this application consists of a cloud-based server and user terminals. The server utilizes cloud infrastructure such as Amazon Web Services (AWS) or Google Cloud Platform, enabling the processing and analysis of large amounts of data. Furthermore, the program is built using Python to facilitate smooth meeting progress. Libraries such as spaCy and Transformers are used for natural language processing, analyzing participants' statements and meeting content in real time.
[0524] The system operates on smart devices and computers used by meeting participants, and dynamic representations facilitate the meeting through interaction with participants. Dynamic representations transcribe spoken audio into text in real time, supporting the meeting's progress while organizing its content. For specific responses and information provision, it employs a method of searching past meeting records and related materials and providing data based on that information.
[0525] As an example of this system, if a content creation team needs materials from a past successful project during a meeting, they can send a prompt message from their terminal to the server saying, "Please display materials related to this new idea." The server can then immediately search for and display the relevant materials. This creates an environment where users can conduct meetings efficiently.
[0526] The AI-generated model automatically searches for and presents relevant information, allowing users to instantly obtain the information they need during meetings. Furthermore, this system is highly scalable, easily integrating new information and historical data.
[0527] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0528] Step 1:
[0529] The server automatically sends an electronic form to participants once the meeting schedule is finalized. It receives meeting schedule information as input and generates an online form. Participants fill in their agenda items on this form, and this information is sent to the server. The server then stores the agenda items submitted by the participants as output.
[0530] Step 2:
[0531] The server stores the received agenda information in a database and analyzes the content list based on importance. It uses the agenda information submitted by participants as input, employing database operations and analysis algorithms. The output is a content list created according to priority.
[0532] Step 3:
[0533] On the terminal, a dynamic expression is activated at the start of the meeting, accepting real-time voice input from participants and converting it to text. The input is the participants' voice statements, and the output is the statements converted into text data. A generative AI model is used to achieve high-quality text conversion.
[0534] Step 4:
[0535] The dynamic representation sends a search request to the server for relevant information based on the prompt text provided by the user. It receives the prompt text as input and performs matching to search for relevant information. The output is a list of the necessary documents and information.
[0536] Step 5:
[0537] The server searches for relevant documents in its historical database and returns them to the terminal. It receives a matched prompt as input and performs data retrieval processing. The output is the corresponding document data. Specifically, it executes a database query.
[0538] Step 6:
[0539] Users develop discussions based on received materials and present new ideas in dynamic representations. They directly utilize materials received from their terminals as input for discussions and proposals. The output consists of improved discussion content and new ideas. Specific interactions take place through virtual avatars.
[0540] Step 7:
[0541] After the meeting ends, the server analyzes the recorded audio data and transcribed discussion to generate a transcript, which is then sent to the participants. It receives conversation data as input and performs text mining and speech analysis. The output is the generated transcript. Specifically, the transcript is saved to cloud storage and distributed via email.
[0542] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0543] This invention provides a system that automates the progress of meetings using virtual avatars and also includes a function to recognize the emotions of participants. The following describes how the server, terminal, and user cooperate to implement this invention.
[0544] The server plays a central role in meeting management, sending an online form to participants once the meeting date is set. Through this form, the server collects agenda items and questions from users and stores them in a database. It analyzes the collected agenda items and creates a priority-based agenda list. This list, along with the meeting details, is emailed to participants, and the event is automatically added to their calendars.
[0545] The terminal activates a virtual avatar at the start of a meeting and begins the meeting after confirming that participants have entered. During the meeting, the terminal acquires emotional data from the user's statements and facial expressions and sends it to the emotion engine. The emotion engine analyzes this data to identify the participants' emotions and provides real-time feedback to the avatar. This allows the virtual avatar to engage in appropriate dialogue according to the user's emotions. For example, if the analysis determines that the user is feeling stressed, the avatar will make relaxing remarks to facilitate the conversation.
[0546] Users can ask questions and express their opinions to virtual avatars via their devices during meetings. Based on analysis results from an emotion engine, the avatar's dialogue in response to user comments is adjusted, providing a better meeting experience for the user.
[0547] After the meeting ends, the server analyzes the audio data and text logs and generates meeting minutes based on the emotional data acquired by the emotion engine. These minutes visualize the emotional shifts that occurred during the meeting and are sent to participants via email. The minutes are also saved to cloud storage for easy access later. This system improves the efficiency of meeting management and enables flexible dialogue that takes participants' emotions into account.
[0548] As described above, the present invention provides a solution for conducting meetings more effectively and in a way that is considerate of participants by combining a virtual avatar and an emotion engine.
[0549] The following describes the processing flow.
[0550] Step 1:
[0551] The server checks the meeting schedule and automatically sends an email to prospective participants containing a link to an online form for collecting agenda items. This form includes fields where users can freely enter topics or questions they wish to propose.
[0552] Step 2:
[0553] Users receive an email, access an online form, enter their agenda items and questions, and submit it. The user's input data is sent to the server and stored in a database.
[0554] Step 3:
[0555] The server analyzes the accumulated agenda data and creates an agenda list based on priority. This list is a crucial element for ensuring a smooth meeting flow.
[0556] Step 4:
[0557] The server generates an agenda list and meeting details, then creates an invitation email and sends it to all participants. Furthermore, the server automatically adds the meeting to the calendar system.
[0558] Step 5:
[0559] At the start of the meeting, the terminal activates a virtual avatar and confirms that all participants have entered the room. The avatar then gives an opening greeting and begins the meeting based on the agenda list.
[0560] Step 6:
[0561] The device collects user speech and facial expression data in real time and sends it to the emotion engine. The emotion engine analyzes this data to identify the user's emotional state.
[0562] Step 7:
[0563] Based on the analysis results of the emotion engine, the virtual avatar on the device adjusts the content of the conversation and provides the user with appropriate feedback and information. For example, if the user shows an emotion of confusion, the avatar will follow up by adding a detailed explanation.
[0564] Step 8:
[0565] As the meeting progresses, audio data and analyzed sentiment data are recorded on the server. Based on this data, the server automatically generates meeting minutes after the meeting concludes.
[0566] Step 9:
[0567] The generated meeting minutes include information visualizing the flow of participants' emotions. These minutes are emailed to participants and also saved in cloud storage. This saving allows users to review the meeting details and related sentiment analysis at a later date.
[0568] (Example 2)
[0569] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0570] In modern business, there is a demand for efficient meetings and smooth communication among participants. However, traditional meeting systems struggle to organize agendas, manage meetings effectively, and facilitate flexible dialogue that takes participants' emotions into account. Furthermore, generating and distributing meeting minutes often involves a lot of manual work, which is burdensome. Therefore, there is a need for a system that automates meeting management and enables dialogue that is sensitive to participants' emotions.
[0571] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0572] In this invention, the server includes data processing means for automating the progress of a meeting, list generation means for creating a priority-based agenda list based on collected agenda items, and emotion analysis means for analyzing participants' facial expressions and voices during the meeting to identify their emotions and provide real-time feedback. This makes it possible to streamline the progress of meetings and realize appropriate dialogue that responds to the emotions of the participants.
[0573] "Data processing means" refers to technical methods for automating the progress of meetings and managing and processing related information.
[0574] "Information gathering methods" refer to methods for efficiently collecting agenda items and questions from meeting participants and storing that information in a database.
[0575] The "list generation method" is a function that analyzes collected agenda data and creates a list that is prioritized based on importance and relevance.
[0576] "Interaction means" refers to a dialogue function in a meeting where a virtual avatar communicates with participants and provides information.
[0577] "Record generation means" refers to technology that analyzes audio and text information generated during a meeting to create detailed meeting minutes.
[0578] The "report distribution method" is a function that quickly distributes the generated meeting minutes to participants and saves them to the cloud as needed.
[0579] "Emotion analysis technology" is a technique that analyzes participants' facial expressions and voices during a meeting to identify their emotions and provides feedback in real time.
[0580] A "dialogue adjustment method" is a method that adjusts the responses of virtual avatars and the content of dialogue based on the emotional state of the participants, thereby achieving more human-like and empathetic communication.
[0581] This invention is a system for efficiently and effectively managing corporate meetings, aiming to automate meeting progress and recognize participants' emotions. This system is primarily implemented by three parties: a server, terminals, and users.
[0582] Server Role
[0583] The server is the central data processing unit for meeting management. When scheduling a meeting, the server sends an online form to participants and uses software to collect agenda items and questions from them. The data obtained through this information gathering method is stored in a database, and a list generation system creates a priority-based agenda list. These agenda lists and meeting details are then emailed to participants, and the meeting is automatically added to their calendar system.
[0584] Terminal role
[0585] The terminal functions as an interface for users to join a meeting. At the start of the meeting, the terminal activates a virtual avatar and manages the meeting while confirming participants' attendance using interaction tools. The terminal is also equipped with a camera and microphone, which capture the user's facial expressions and voice. Emotion analysis tools identify emotions in real time and provide feedback. Based on this, the terminal has a function to adjust the virtual avatar's dialogue according to the participant's emotional state. For example, if the analysis indicates that the user is stressed, the virtual avatar can make relaxing remarks.
[0586] User roles
[0587] Users can ask questions and express their opinions directly to virtual avatars via their devices during meetings. Sentiment analysis-based responses create an optimal meeting experience for each user.
[0588] After the meeting ends, the server uses recording generation capabilities to automatically analyze the meeting's audio and text data, generate detailed meeting minutes incorporating sentiment data, and distribute them to participants using reporting and distribution capabilities, as well as save them to the cloud.
[0589] Examples of specific cases and prompt statements
[0590] A concrete example is a project meeting in a manufacturing company. In this meeting, the project overview and challenges are discussed in the initial stages, and the agenda is dynamically adjusted throughout the meeting based on the feelings and opinions of the team members.
[0591] Example of a prompt:
[0592] "Propose a system that analyzes participants' emotions and provides real-time feedback."
[0593] "Explain how to adjust the content of a conversation based on the results of emotion analysis."
[0594] As described above, this system combines the latest generative AI models with emotion analysis technology to revitalize communication within companies and enable efficient meeting management.
[0595] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0596] Step 1:
[0597] As soon as the meeting date is set, the server creates an online form and sends it to participants. The required inputs are the meeting date, participant contact information, and a form template. The collected participant agenda and question data are then stored in a database.
[0598] Step 2:
[0599] The server retrieves agenda and question data from the database and uses a list generation mechanism to create a priority-based agenda list. The input is the agenda data from the database, and the output is an organized agenda list. This list serves as the basis for facilitating the smooth running of the meeting.
[0600] Step 3:
[0601] The terminal activates a virtual avatar at the start of the meeting. It requires the generated agenda list and participant information as input. Based on this information, the terminal verifies participant entry and displays any necessary initial messages.
[0602] Step 4:
[0603] During the meeting, the device uses its camera and microphone to capture participants' facial expressions and voices. The input consists of real-time video and audio data. This data is sent to an emotion analysis system to identify emotional states. The output is the participants' emotional states, and the avatar's responses are adjusted based on this.
[0604] Step 5:
[0605] The device receives emotion analysis results in real time and adjusts the virtual avatar's dialogue accordingly. For example, if the system determines that the user is confused, the avatar will provide a more detailed and easier-to-understand explanation. The input is emotion state data, and the output is the adjusted dialogue.
[0606] Step 6:
[0607] After the meeting ends, the server analyzes the audio and text data and creates meeting minutes using a record generation system. This process uses speech recognition and natural language processing technologies, with recorded audio data and text logs as input and meeting minutes files as output.
[0608] Step 7:
[0609] The server ultimately distributes the generated meeting minutes to participants and stores them in cloud storage. The input is the completed meeting minutes, and the output is emailing them to participants and storing them in the cloud. This allows participants to access the meeting minutes at any time.
[0610] (Application Example 2)
[0611] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0612] Traditional meeting facilitation methods often resulted in an excessive burden on certain participants despite their full attendance, and made it difficult to engage in dialogue that appropriately considered the feelings and moods of all participants. Furthermore, the process of creating meeting minutes was time-consuming and inefficient. There is a need to address these challenges and provide methods for conducting meetings efficiently and with the participants' needs in mind.
[0613] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0614] In this invention, the server includes a generation means for generating virtual characters to automate the progress of a meeting, an analysis means for acquiring facial expression data of participants during the meeting and analyzing their emotions, and an adjustment means for adjusting the content of the virtual characters' dialogue based on the analyzed emotion data. This enables efficient meeting progress while taking into consideration all participants.
[0615] A "virtual person for automating meeting management" is a digital agent that autonomously handles tasks such as presenting agenda items and creating meeting minutes during a meeting.
[0616] "Acquisition method" refers to a system component equipped with the function of collecting agenda items and opinions from meeting participants.
[0617] A "priority-dependent agenda list" is a list of collected agenda items sorted according to their priority.
[0618] "An adjustment mechanism for adjusting the content of the dialogue" refers to a function that appropriately modifies the statements of virtual characters according to the emotions of the participants and the flow of the conversation.
[0619] The "analysis method" is a mechanism that decodes and analyzes participants' emotional states from their facial expressions and voice data.
[0620] A "record" is a document that describes what was said, what was discussed, what was discussed, what emotions were felt, what actions were taken, etc., during a meeting.
[0621] "Generation methods" refer to a group of functions that generate virtual characters or construct deliverables based on data analyzed during meetings.
[0622] The system of this invention consists of a server, terminals, and users, each working together to improve meeting efficiency and enable dialogue that takes into account the feelings of the participants.
[0623] The server manages the central hub of the meeting. First, once the meeting date is set, the server sends an online form to participants. This form is used to collect agenda items and questions from participants. The collected information is stored in the server's database, and an agenda list is generated based on importance. This list, along with the meeting details, is sent to participants and automatically added to their calendars.
[0624] The terminal activates a virtual character at the start of the meeting. This virtual character begins conducting the meeting after confirming that participants have entered. During the meeting, the terminal collects participants' statements and facial expression data and sends it to the emotion engine. Based on the emotion data analyzed in real time, the virtual character provides appropriate responses according to the participants' emotions, thereby ensuring the smooth running of the meeting. For example, if the virtual character detects that a participant is feeling anxious, it will use reassuring language.
[0625] Users can ask questions and express opinions to virtual characters via their devices. The content of the virtual characters' dialogue is adjusted based on the analysis results from the emotion engine, enabling discussions that are sensitive to changes in participants' opinions and emotions.
[0626] This system uses the Python DeepFace library and OpenCV for emotion analysis. This allows for efficient determination of emotions from image data acquired via camera. Specifically, it can analyze emotions in real time in response to comments made during a meeting, enabling immediate detection of how participants will react when a new topic is introduced.
[0627] By using a generative AI model and considering examples of prompts such as, "What elements are necessary to design a virtual system that can grasp the emotions of meeting participants in real time and engage in corresponding dialogue?", it is possible to achieve more accurate emotion analysis and dialogue functionality.
[0628] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0629] Step 1:
[0630] Once the meeting date is set, the server sends an online form to all participants. This form is used to collect agenda items and questions from participants, using participant information and the meeting date and time as input. The server saves this information to a database, preparing the basic data for the agenda list. The output is the saved data.
[0631] Step 2:
[0632] The server creates a list of agenda items based on their importance, using an algorithm to evaluate and classify them. The input is agenda data stored in a database, and the output is a list of agenda items sorted by importance. This list is distributed to participants via email and automatically added to their calendars.
[0633] Step 3:
[0634] The terminal activates the virtual character at the start of the meeting. Once it confirms that participants have entered the meeting, the virtual character begins conducting the meeting based on the agenda list. Here, participant entry data is used as input, and the virtual character is automatically initialized. The output is the start of the virtual character's activities.
[0635] Step 4:
[0636] The device collects participants' speech and facial expression data during meetings using its camera and microphone, and transmits this data to the emotion engine in real time. The input is camera video and audio data, and the output is emotion data. Facial expression analysis using the DeepFace library is performed as data processing.
[0637] Step 5:
[0638] The server uses the analyzed emotion data to adjust the dialogue of the virtual character. The input is the emotion data from step 4, and an algorithm is applied that appropriately modifies the content of the virtual character's statements. The output is the dialogue of the virtual character, tailored to the participant's emotions.
[0639] Step 6:
[0640] After the meeting ends, the server analyzes the audio and text data and generates meeting minutes based on information from the emotion engine. At this stage, all data from the meeting is used as input, and natural language processing techniques are employed for analysis. The generated meeting minutes are then distributed to the participants as output.
[0641] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0642] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0643] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0644] [Fourth Embodiment]
[0645] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0646] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0647] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0648] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0649] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0650] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0651] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0652] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0653] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0654] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0655] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0656] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0657] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0658] This invention is an automated meeting system using virtual avatars, and its operation is realized by multiple devices. The following describes how the server, terminal, and user cooperate to implement this system.
[0659] The server acts as the central hub of the system, automatically sending online forms to participants once a meeting is scheduled. Through these forms, it collects agenda items and questions from participants, organizing and storing them in a database. The collected information is analyzed by the server, and an agenda list is created based on importance. The created agenda list and meeting details are emailed to participants, and are also added to their calendars for schedule management.
[0660] On the terminal, a virtual avatar is activated when a meeting begins. This avatar interacts with participants through the terminal's interface and plays a role in ensuring the smooth progress of the meeting. The virtual avatar analyzes participants' comments in real time and provides information and facilitates discussion according to the flow of the discussion. For example, when a user makes a new suggestion, the avatar can search for similar past data and display it on the terminal as reference information.
[0661] Users can ask questions and offer opinions to virtual avatars via their devices during meetings. The avatars analyze the users' comments and use them as material to effectively facilitate the conversation. This interactive format allows for smooth information exchange during meetings and ensures that reference materials are presented in a timely manner.
[0662] After the meeting ends, the server automatically analyzes the recorded audio and text data to create meeting minutes. The generated minutes are emailed to participants and also stored in a cloud storage system for easy access later. This system allows users to reduce the time spent on meeting management and work more efficiently without overlooking important information.
[0663] In this way, this system comprehensively automates all meeting processes by using virtual avatars, providing participants with a more valuable meeting experience.
[0664] The following describes the processing flow.
[0665] Step 1:
[0666] The server checks the meeting schedule and automatically sends an email to participants containing a link to an online form for collecting agenda items and questions.
[0667] Step 2:
[0668] Users access an online form, enter the agenda items or questions they wish to propose, and submit it. Each user's input is sent to the server and stored in a database.
[0669] Step 3:
[0670] The server analyzes the collected agenda items and creates an agenda list based on priority and relevance. This agenda list will be used to guide the meeting.
[0671] Step 4:
[0672] The server sends participants an email containing a generated agenda list, meeting date and time, connection link, and other detailed information. It also automatically adds the meeting to their calendars.
[0673] Step 5:
[0674] At the start of the meeting, a virtual avatar is activated on the terminal to confirm the participant's entry. The avatar begins the meeting with an opening greeting and then proceeds based on the agenda list.
[0675] Step 6:
[0676] Users ask questions and express their opinions to a virtual avatar through their device. The avatar analyzes these inputs, provides relevant information in real time, and controls the flow of the conversation.
[0677] Step 7:
[0678] The terminal records audio data or text logs during the meeting and sends this information to the server. This recording is used later to create meeting minutes.
[0679] Step 8:
[0680] After the meeting ends, the server analyzes the audio recordings and text logs to automatically generate meeting minutes, including each speaker's remarks, decisions made, and action items. These minutes are emailed to participants and stored in cloud storage, making them easily accessible later.
[0681] (Example 1)
[0682] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0683] Conducting modern meetings requires considerable time and effort, from preparation and execution to record-keeping. Furthermore, as the number of participants increases, organizing information and ensuring smooth discussion becomes increasingly difficult. Additionally, traditional methods often make it difficult to retrieve information spoken during a meeting or refer to past examples, leading to decreased meeting efficiency. This invention aims to address these challenges and improve meeting productivity.
[0684] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0685] In this invention, the server includes means for generating a virtual representation that automates the progress of a meeting, means for adding the meeting to the participants' schedules, and means for searching past information using the virtual representation and providing similar information. This makes it possible to facilitate the progress of the meeting and to quickly provide the information necessary during the discussion.
[0686] A "virtual representation" is a digital agent that acts as a substitute for a human in a meeting, supporting the progress of the discussion.
[0687] "Means of collection" refers to the methods and functions for obtaining and organizing information from meeting participants.
[0688] "Means of creation" refers to the methods and functions for compiling a list of topics based on the collected topics.
[0689] "Means of providing" refers to the methods and functions for presenting appropriate information to participants during a meeting.
[0690] "Means of generation" refers to methods or functions for creating records based on information obtained during a meeting.
[0691] "Means of distribution" refers to the methods or functions for sending the generated records to participants.
[0692] "Methods for adding meetings to participants' schedules" refers to methods or functions for automatically registering meetings in participants' schedules.
[0693] "Means of searching and providing similar information" refers to methods or functions for finding past information and indicating information relevant to the current meeting.
[0694] This invention is a system that automates the progress of meetings through a virtual representation. This system combines multiple technologies and enables efficient meeting management.
[0695] The server plays a central role in the system. Once a meeting is scheduled, it creates an online inquiry and sends it to the meeting participants. This inquiry can use a standard online form service. Information submitted by participants is stored in a database. The server uses natural language processing technology to analyze the collected information and creates a list of topics based on their importance. The created list of topics and meeting details are notified to participants via email and are automatically added to their schedules through the scheduling management system.
[0696] The terminal activates a virtual representation at the start of the meeting and manages direct interaction with participants. The virtual representation uses speech recognition technology to receive voice input and analyzes speech during the meeting in real time. This information analysis employs natural language processing technology and database technology for retrieving past information. The virtual representation provides relevant information in response to participants' questions and supports the progress of the meeting.
[0697] Users can directly ask questions and express opinions to virtual representations through their terminals. User input is instantly analyzed by the virtual representation, and appropriate information is provided quickly. This operation enables efficient meeting progress and supports users in obtaining information from their own perspective.
[0698] As a concrete example of its use, when proposing an idea for a new strategic project at a meeting, the user fills in the necessary information in advance on an online inquiry sent from the server. Then, when the proposal is made during the meeting, the virtual representation can instantly search relevant historical data and provide reference information.
[0699] As an example of a prompt, by presenting a question to the generating AI model such as, "Please provide a virtual representation of how to effectively conduct the next strategic meeting," we can obtain detailed information from the AI about the system's capabilities.
[0700] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0701] Step 1:
[0702] The server sends online inquiries to participants after the meeting schedule has been set. It receives the meeting schedule and participant list as input, generates an online inquiry link as output, and sends it via email. Specifically, the server uses an online form service API to create the inquiry.
[0703] Step 2:
[0704] Users input agenda items and questions through online queries provided by the server. As input, users enter their proposed agenda items and questions into an online form and submit it to the server. As output, the user's input data is stored in the server's database.
[0705] Step 3:
[0706] The server analyzes the collected participant information and evaluates the importance of the agenda items. It takes input data from meeting participants as input and analyzes it using natural language processing techniques. As output, it creates a list of topics categorized by importance. The server uses Python's natural language processing library for data processing.
[0707] Step 4:
[0708] The server notifies participants of the created topic list and meeting details, and simultaneously adds the meeting to the calendar using the scheduling management system. It takes the topic list and meeting details as input, and outputs email notifications and calendar additions. The server performs this using the calendar API and email API.
[0709] Step 5:
[0710] On the terminal, a virtual representation is activated at the start of a meeting. It receives a meeting start signal for a specific date and time as input, and the virtual representation's interface is activated as output. The terminal performs this operation using software from the virtual environment.
[0711] Step 6:
[0712] Users make statements and ask questions through a terminal during the meeting. Input is sent to the terminal as voice or text, and output is received as answers or information presented by a virtual representation. The terminal converts the voice to text using a speech recognition API and obtains the answer through a generative AI model.
[0713] Step 7:
[0714] A virtual representation responds to participants' questions, providing relevant information by referencing past data. It accepts natural text queries as input and presents similar information retrieved from a database as output. The terminal implements this using a database search algorithm.
[0715] Step 8:
[0716] After the meeting ends, the server analyzes the audio and text data from the meeting, automatically generates meeting minutes, and distributes them to the participants. It receives recorded audio and text data as input and generates a text file of the meeting minutes as output. The server performs this process using audio analysis and text generation technologies.
[0717] (Application Example 1)
[0718] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0719] Meeting facilitation and information sharing are typically supported by effective communication among participants and efficient decision-making processes. However, running meetings requires considerable time and effort, and meetings related to content creation, in particular, often require quick access to past materials and information. Furthermore, it is not easy to immediately incorporate participants' opinions and facilitate effective discussion. To address these challenges, there is a need for new technological solutions that streamline meetings while simultaneously realizing creative ideas.
[0720] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0721] In this invention, the server includes a generation means for automating the progress of the meeting, a collection means for dynamically collecting agenda items from meeting participants, and a creation means for creating a content list. This makes it possible to efficiently gather participants' opinions during the content creation process and to quickly present relevant materials during the meeting.
[0722] "Dynamic representation" refers to a form of virtual avatar that automates the progress of a meeting and enables interactive dialogue with participants.
[0723] "Means of information gathering" refers to mechanisms for effectively collecting agenda items and information from participants in a meeting.
[0724] A "content list" is a list of items created based on the collected agenda and information, used to organize the discussions during a meeting.
[0725] "Means of delivery" refers to the function of providing information to participants through dialogue during a meeting using dynamic representations.
[0726] "Audio data" refers to audio information recorded during a meeting.
[0727] "Document data" refers to text-based information generated during a meeting.
[0728] A "meeting record" is a document that summarizes the contents of a meeting and functions as a meeting minutes.
[0729] "Distribution method" refers to a method for electronically transmitting the generated record text to participants.
[0730] "Search methods" refer to the process of quickly finding past information and presenting it as relevant material during a meeting.
[0731] "Support measures" refer to functions that support the development of suggestions and ideas during the content creation process.
[0732] The system that implements this application consists of a cloud-based server and user terminals. The server utilizes cloud infrastructure such as Amazon Web Services (AWS) or Google Cloud Platform, enabling the processing and analysis of large amounts of data. Furthermore, the program is built using Python to facilitate smooth meeting progress. Libraries such as spaCy and Transformers are used for natural language processing, analyzing participants' statements and meeting content in real time.
[0733] The system operates on smart devices and computers used by meeting participants, and dynamic representations facilitate the meeting through interaction with participants. Dynamic representations transcribe spoken audio into text in real time, supporting the meeting's progress while organizing its content. For specific responses and information provision, it employs a method of searching past meeting records and related materials and providing data based on that information.
[0734] As an example of this system, if a content creation team needs materials from a past successful project during a meeting, they can send a prompt message from their terminal to the server saying, "Please display materials related to this new idea." The server can then immediately search for and display the relevant materials. This creates an environment where users can conduct meetings efficiently.
[0735] The AI-generated model automatically searches for and presents relevant information, allowing users to instantly obtain the information they need during meetings. Furthermore, this system is highly scalable, easily integrating new information and historical data.
[0736] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0737] Step 1:
[0738] The server automatically sends an electronic form to participants once the meeting schedule is finalized. It receives meeting schedule information as input and generates an online form. Participants fill in their agenda items on this form, and this information is sent to the server. The server then stores the agenda items submitted by the participants as output.
[0739] Step 2:
[0740] The server stores the received agenda information in a database and analyzes the content list based on importance. It uses the agenda information submitted by participants as input, employing database operations and analysis algorithms. The output is a content list created according to priority.
[0741] Step 3:
[0742] On the terminal, a dynamic expression is activated at the start of the meeting, accepting real-time voice input from participants and converting it to text. The input is the participants' voice statements, and the output is the statements converted into text data. A generative AI model is used to achieve high-quality text conversion.
[0743] Step 4:
[0744] The dynamic representation sends a search request to the server for relevant information based on the prompt text provided by the user. It receives the prompt text as input and performs matching to search for relevant information. The output is a list of the necessary documents and information.
[0745] Step 5:
[0746] The server searches for relevant documents in its historical database and returns them to the terminal. It receives a matched prompt as input and performs data retrieval processing. The output is the corresponding document data. Specifically, it executes a database query.
[0747] Step 6:
[0748] Users develop discussions based on received materials and present new ideas in dynamic representations. They directly utilize materials received from their terminals as input for discussions and proposals. The output consists of improved discussion content and new ideas. Specific interactions take place through virtual avatars.
[0749] Step 7:
[0750] After the meeting ends, the server analyzes the recorded audio data and transcribed discussion to generate a transcript, which is then sent to the participants. It receives conversation data as input and performs text mining and speech analysis. The output is the generated transcript. Specifically, the transcript is saved to cloud storage and distributed via email.
[0751] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0752] This invention provides a system that automates the progress of meetings using virtual avatars and also includes a function to recognize the emotions of participants. The following describes how the server, terminal, and user cooperate to implement this invention.
[0753] The server plays a central role in meeting management, sending an online form to participants once the meeting date is set. Through this form, the server collects agenda items and questions from users and stores them in a database. It analyzes the collected agenda items and creates a priority-based agenda list. This list, along with the meeting details, is emailed to participants, and the event is automatically added to their calendars.
[0754] The terminal activates a virtual avatar at the start of a meeting and begins the meeting after confirming that participants have entered. During the meeting, the terminal acquires emotional data from the user's statements and facial expressions and sends it to the emotion engine. The emotion engine analyzes this data to identify the participants' emotions and provides real-time feedback to the avatar. This allows the virtual avatar to engage in appropriate dialogue according to the user's emotions. For example, if the analysis determines that the user is feeling stressed, the avatar will make relaxing remarks to facilitate the conversation.
[0755] Users can ask questions and express their opinions to virtual avatars via their devices during meetings. Based on analysis results from an emotion engine, the avatar's dialogue in response to user comments is adjusted, providing a better meeting experience for the user.
[0756] After the meeting ends, the server analyzes the audio data and text logs and generates meeting minutes based on the emotional data acquired by the emotion engine. These minutes visualize the emotional shifts that occurred during the meeting and are sent to participants via email. The minutes are also saved to cloud storage for easy access later. This system improves the efficiency of meeting management and enables flexible dialogue that takes participants' emotions into account.
[0757] As described above, the present invention provides a solution for conducting meetings more effectively and in a way that is considerate of participants by combining a virtual avatar and an emotion engine.
[0758] The following describes the processing flow.
[0759] Step 1:
[0760] The server checks the meeting schedule and automatically sends an email to prospective participants containing a link to an online form for collecting agenda items. This form includes fields where users can freely enter topics or questions they wish to propose.
[0761] Step 2:
[0762] Users receive an email, access an online form, enter their agenda items and questions, and submit it. The user's input data is sent to the server and stored in a database.
[0763] Step 3:
[0764] The server analyzes the accumulated agenda data and creates an agenda list based on priority. This list is a crucial element for ensuring a smooth meeting flow.
[0765] Step 4:
[0766] The server generates an agenda list and meeting details, then creates an invitation email and sends it to all participants. Furthermore, the server automatically adds the meeting to the calendar system.
[0767] Step 5:
[0768] At the start of the meeting, the terminal activates a virtual avatar and confirms that all participants have entered the room. The avatar then gives an opening greeting and begins the meeting based on the agenda list.
[0769] Step 6:
[0770] The device collects user speech and facial expression data in real time and sends it to the emotion engine. The emotion engine analyzes this data to identify the user's emotional state.
[0771] Step 7:
[0772] Based on the analysis results of the emotion engine, the virtual avatar on the device adjusts the content of the conversation and provides the user with appropriate feedback and information. For example, if the user shows an emotion of confusion, the avatar will follow up by adding a detailed explanation.
[0773] Step 8:
[0774] As the meeting progresses, audio data and analyzed sentiment data are recorded on the server. Based on this data, the server automatically generates meeting minutes after the meeting concludes.
[0775] Step 9:
[0776] The generated meeting minutes include information visualizing the flow of participants' emotions. These minutes are emailed to participants and also saved in cloud storage. This saving allows users to review the meeting details and related sentiment analysis at a later date.
[0777] (Example 2)
[0778] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0779] In modern business, there is a demand for efficient meetings and smooth communication among participants. However, traditional meeting systems struggle to organize agendas, manage meetings effectively, and facilitate flexible dialogue that takes participants' emotions into account. Furthermore, generating and distributing meeting minutes often involves a lot of manual work, which is burdensome. Therefore, there is a need for a system that automates meeting management and enables dialogue that is sensitive to participants' emotions.
[0780] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0781] In this invention, the server includes data processing means for automating the progress of a meeting, list generation means for creating a priority-based agenda list based on collected agenda items, and emotion analysis means for analyzing participants' facial expressions and voices during the meeting to identify their emotions and provide real-time feedback. This makes it possible to streamline the progress of meetings and realize appropriate dialogue that responds to the emotions of the participants.
[0782] "Data processing means" refers to technical methods for automating the progress of meetings and managing and processing related information.
[0783] "Information gathering methods" refer to methods for efficiently collecting agenda items and questions from meeting participants and storing that information in a database.
[0784] The "list generation method" is a function that analyzes collected agenda data and creates a list that is prioritized based on importance and relevance.
[0785] "Interaction means" refers to a dialogue function in a meeting where a virtual avatar communicates with participants and provides information.
[0786] "Record generation means" refers to technology that analyzes audio and text information generated during a meeting to create detailed meeting minutes.
[0787] The "report distribution method" is a function that quickly distributes the generated meeting minutes to participants and saves them to the cloud as needed.
[0788] "Emotion analysis technology" is a technique that analyzes participants' facial expressions and voices during a meeting to identify their emotions and provides feedback in real time.
[0789] A "dialogue adjustment method" is a method that adjusts the responses of virtual avatars and the content of dialogue based on the emotional state of the participants, thereby achieving more human-like and empathetic communication.
[0790] This invention is a system for efficiently and effectively managing corporate meetings, aiming to automate meeting progress and recognize participants' emotions. This system is primarily implemented by three parties: a server, terminals, and users.
[0791] Server Role
[0792] The server is the central data processing unit for meeting management. When scheduling a meeting, the server sends an online form to participants and uses software to collect agenda items and questions from them. The data obtained through this information gathering method is stored in a database, and a list generation system creates a priority-based agenda list. These agenda lists and meeting details are then emailed to participants, and the meeting is automatically added to their calendar system.
[0793] Terminal role
[0794] The terminal functions as an interface for users to join a meeting. At the start of the meeting, the terminal activates a virtual avatar and manages the meeting while confirming participants' attendance using interaction tools. The terminal is also equipped with a camera and microphone, which capture the user's facial expressions and voice. Emotion analysis tools identify emotions in real time and provide feedback. Based on this, the terminal has a function to adjust the virtual avatar's dialogue according to the participant's emotional state. For example, if the analysis indicates that the user is stressed, the virtual avatar can make relaxing remarks.
[0795] User roles
[0796] Users can ask questions and express their opinions directly to virtual avatars via their devices during meetings. Sentiment analysis-based responses create an optimal meeting experience for each user.
[0797] After the meeting ends, the server uses recording generation capabilities to automatically analyze the meeting's audio and text data, generate detailed meeting minutes incorporating sentiment data, and distribute them to participants using reporting and distribution capabilities, as well as save them to the cloud.
[0798] Examples of specific cases and prompt statements
[0799] A concrete example is a project meeting in a manufacturing company. In this meeting, the project overview and challenges are discussed in the initial stages, and the agenda is dynamically adjusted throughout the meeting based on the feelings and opinions of the team members.
[0800] Example of a prompt:
[0801] "Propose a system that analyzes participants' emotions and provides real-time feedback."
[0802] "Explain how to adjust the content of a conversation based on the results of emotion analysis."
[0803] As described above, this system combines the latest generative AI models with emotion analysis technology to revitalize communication within companies and enable efficient meeting management.
[0804] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0805] Step 1:
[0806] As soon as the meeting date is set, the server creates an online form and sends it to participants. The required inputs are the meeting date, participant contact information, and a form template. The collected participant agenda and question data are then stored in a database.
[0807] Step 2:
[0808] The server retrieves agenda and question data from the database and uses a list generation mechanism to create a priority-based agenda list. The input is the agenda data from the database, and the output is an organized agenda list. This list serves as the basis for facilitating the smooth running of the meeting.
[0809] Step 3:
[0810] The terminal activates a virtual avatar at the start of the meeting. It requires the generated agenda list and participant information as input. Based on this information, the terminal verifies participant entry and displays any necessary initial messages.
[0811] Step 4:
[0812] During the meeting, the device uses its camera and microphone to capture participants' facial expressions and voices. The input consists of real-time video and audio data. This data is sent to an emotion analysis system to identify emotional states. The output is the participants' emotional states, and the avatar's responses are adjusted based on this.
[0813] Step 5:
[0814] The device receives emotion analysis results in real time and adjusts the virtual avatar's dialogue accordingly. For example, if the system determines that the user is confused, the avatar will provide a more detailed and easier-to-understand explanation. The input is emotion state data, and the output is the adjusted dialogue.
[0815] Step 6:
[0816] After the meeting ends, the server analyzes the audio and text data and creates meeting minutes using a record generation system. This process uses speech recognition and natural language processing technologies, with recorded audio data and text logs as input and meeting minutes files as output.
[0817] Step 7:
[0818] The server ultimately distributes the generated meeting minutes to participants and stores them in cloud storage. The input is the completed meeting minutes, and the output is emailing them to participants and storing them in the cloud. This allows participants to access the meeting minutes at any time.
[0819] (Application Example 2)
[0820] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0821] Traditional meeting facilitation methods often resulted in an excessive burden on certain participants despite their full attendance, and made it difficult to engage in dialogue that appropriately considered the feelings and moods of all participants. Furthermore, the process of creating meeting minutes was time-consuming and inefficient. There is a need to address these challenges and provide methods for conducting meetings efficiently and with the participants' needs in mind.
[0822] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0823] In this invention, the server includes a generation means for generating virtual characters to automate the progress of a meeting, an analysis means for acquiring facial expression data of participants during the meeting and analyzing their emotions, and an adjustment means for adjusting the content of the virtual characters' dialogue based on the analyzed emotion data. This enables efficient meeting progress while taking into consideration all participants.
[0824] A "virtual person for automating meeting management" is a digital agent that autonomously handles tasks such as presenting agenda items and creating meeting minutes during a meeting.
[0825] "Acquisition method" refers to a system component equipped with the function of collecting agenda items and opinions from meeting participants.
[0826] A "priority-dependent agenda list" is a list of collected agenda items sorted according to their priority.
[0827] "An adjustment mechanism for adjusting the content of the dialogue" refers to a function that appropriately modifies the statements of virtual characters according to the emotions of the participants and the flow of the conversation.
[0828] The "analysis method" is a mechanism that decodes and analyzes participants' emotional states from their facial expressions and voice data.
[0829] A "record" is a document that describes what was said, what was discussed, what was discussed, what emotions were felt, what actions were taken, etc., during a meeting.
[0830] "Generation methods" refer to a group of functions that generate virtual characters or construct deliverables based on data analyzed during meetings.
[0831] The system of this invention consists of a server, terminals, and users, each working together to improve meeting efficiency and enable dialogue that takes into account the feelings of the participants.
[0832] The server manages the central hub of the meeting. First, once the meeting date is set, the server sends an online form to participants. This form is used to collect agenda items and questions from participants. The collected information is stored in the server's database, and an agenda list is generated based on importance. This list, along with the meeting details, is sent to participants and automatically added to their calendars.
[0833] The terminal activates a virtual character at the start of the meeting. This virtual character begins conducting the meeting after confirming that participants have entered. During the meeting, the terminal collects participants' statements and facial expression data and sends it to the emotion engine. Based on the emotion data analyzed in real time, the virtual character provides appropriate responses according to the participants' emotions, thereby ensuring the smooth running of the meeting. For example, if the virtual character detects that a participant is feeling anxious, it will use reassuring language.
[0834] Users can ask questions and express opinions to virtual characters via their devices. The content of the virtual characters' dialogue is adjusted based on the analysis results from the emotion engine, enabling discussions that are sensitive to changes in participants' opinions and emotions.
[0835] This system uses the Python DeepFace library and OpenCV for emotion analysis. This allows for efficient determination of emotions from image data acquired via camera. Specifically, it can analyze emotions in real time in response to comments made during a meeting, enabling immediate detection of how participants will react when a new topic is introduced.
[0836] By using a generative AI model and considering examples of prompts such as, "What elements are necessary to design a virtual system that can grasp the emotions of meeting participants in real time and engage in corresponding dialogue?", it is possible to achieve more accurate emotion analysis and dialogue functionality.
[0837] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0838] Step 1:
[0839] Once the meeting date is set, the server sends an online form to all participants. This form is used to collect agenda items and questions from participants, using participant information and the meeting date and time as input. The server saves this information to a database, preparing the basic data for the agenda list. The output is the saved data.
[0840] Step 2:
[0841] The server creates a list of agenda items based on their importance, using an algorithm to evaluate and classify them. The input is agenda data stored in a database, and the output is a list of agenda items sorted by importance. This list is distributed to participants via email and automatically added to their calendars.
[0842] Step 3:
[0843] The terminal activates the virtual character at the start of the meeting. Once it confirms that participants have entered the meeting, the virtual character begins conducting the meeting based on the agenda list. Here, participant entry data is used as input, and the virtual character is automatically initialized. The output is the start of the virtual character's activities.
[0844] Step 4:
[0845] The device collects participants' speech and facial expression data during meetings using its camera and microphone, and transmits this data to the emotion engine in real time. The input is camera video and audio data, and the output is emotion data. Facial expression analysis using the DeepFace library is performed as data processing.
[0846] Step 5:
[0847] The server uses the analyzed emotion data to adjust the dialogue of the virtual character. The input is the emotion data from step 4, and an algorithm is applied that appropriately modifies the content of the virtual character's statements. The output is the dialogue of the virtual character, tailored to the participant's emotions.
[0848] Step 6:
[0849] After the meeting ends, the server analyzes the audio and text data and generates meeting minutes based on information from the emotion engine. At this stage, all data from the meeting is used as input, and natural language processing techniques are employed for analysis. The generated meeting minutes are then distributed to the participants as output.
[0850] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0851] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0852] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0853] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0854] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0855] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0856] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0857] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0858] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0859] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0860] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0861] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0862] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0863] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0864] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0865] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0866] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0867] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0868] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0869] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0870] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0871] The following is further disclosed regarding the embodiments described above.
[0872] (Claim 1)
[0873] A means for generating virtual avatars to automate the progress of a meeting,
[0874] The aforementioned virtual avatar is a means for collecting agenda items from meeting participants,
[0875] A method for creating an agenda list based on collected topics,
[0876] The aforementioned virtual avatar facilitates the meeting and provides information through dialogue with participants,
[0877] A generation means for generating meeting minutes by analyzing audio or text data during a meeting,
[0878] A system that includes a distribution method for delivering generated meeting minutes to participants.
[0879] (Claim 2)
[0880] The system according to claim 1, which sends an online form to participants before a meeting to collect agenda items.
[0881] (Claim 3)
[0882] The system according to claim 1, in which a virtual avatar answers questions from participants during a meeting and retrieves and provides relevant information.
[0883] "Example 1"
[0884] (Claim 1)
[0885] A means of generating a virtual representation that automates the progress of a meeting,
[0886] The aforementioned virtual entity provides means for collecting topics from meeting participants,
[0887] A method for creating a list of topics based on collected topics,
[0888] The aforementioned virtual representation facilitates the meeting and provides information through dialogue with participants,
[0889] A means for analyzing audio or text information during a meeting to generate a record,
[0890] A means of distributing the generated records to participants,
[0891] A way to add meetings to participants' schedules,
[0892] A means of searching for past information using a virtual representation and providing similar information,
[0893] A system that includes this.
[0894] (Claim 2)
[0895] The system according to claim 1, which sends online inquiries to participants before a meeting to collect topics.
[0896] (Claim 3)
[0897] The system according to claim 1, in which a virtual representation responds to participants' questions during a meeting and searches for and provides relevant information.
[0898] "Application Example 1"
[0899] (Claim 1)
[0900] A generation means for generating dynamic representations that automate the progress of a meeting,
[0901] The aforementioned dynamic representation is a means for collecting agenda items from meeting participants,
[0902] A method for creating a content list based on the collected agenda items,
[0903] The aforementioned dynamic representation serves as a means of providing information through dialogue with participants and facilitating the meeting.
[0904] A generation means for analyzing audio data or document data during a meeting to generate a recorded text,
[0905] A distribution method for delivering the generated record text to participants,
[0906] A search method that retrieves past information and presents relevant materials during a meeting,
[0907] A system that includes support tools to assist with suggestions during the content creation process.
[0908] (Claim 2)
[0909] The system according to claim 1, which sends electronic input to participants before a meeting and collects the agenda.
[0910] (Claim 3)
[0911] The system according to claim 1, in which a dynamic representation responds to participants' questions during a meeting, and searches for and provides relevant information.
[0912] "Example 2 of combining an emotion engine"
[0913] (Claim 1)
[0914] A data processing method that automates the progress of meetings,
[0915] The aforementioned data processing means includes an information gathering means for collecting agenda items from meeting participants,
[0916] A list generation method that creates a list of agenda items based on priority, based on the collected agenda items,
[0917] The aforementioned data processing means facilitates the meeting and provides information through dialogue with participants,
[0918] A record generation means that generates meeting minutes by analyzing audio or text information during a meeting,
[0919] A reporting distribution method that distributes the generated meeting minutes to participants,
[0920] An emotion analysis tool that analyzes participants' facial expressions and voices during a meeting to identify their emotions and provides real-time feedback on the results,
[0921] A system including a dialogue adjustment mechanism that adjusts the content of a virtual avatar's dialogue according to the emotional state of the participants.
[0922] (Claim 2)
[0923] The system according to claim 1, wherein an online form is sent to participants using a communication means before the meeting to collect agenda items.
[0924] (Claim 3)
[0925] The system according to claim 1, wherein a virtual avatar responds to participants' questions during a meeting, searches for and provides relevant information, and adjusts its responses based on the participants' emotional state.
[0926] "Application example 2 when combining with an emotional engine"
[0927] (Claim 1)
[0928] A generation method for generating virtual characters to automate the progress of meetings,
[0929] The aforementioned virtual person has means for collecting agenda items from meeting participants,
[0930] A method for creating an agenda list based on importance, using the collected agenda items as a basis.
[0931] The aforementioned virtual person facilitates the meeting and provides information through dialogue with participants,
[0932] An analytical method that acquires facial expression data from participants during a meeting and analyzes their emotions,
[0933] An adjustment means for adjusting the content of a virtual character's dialogue based on analyzed emotional data,
[0934] A generation means that analyzes audio or text data during a meeting to generate a record,
[0935] A system that includes a means of distributing the generated records to participants.
[0936] (Claim 2)
[0937] The system according to claim 1, which sends an online form to participants before a meeting to collect agenda items.
[0938] (Claim 3)
[0939] The system according to claim 1, in which a virtual person responds to participants' statements during a meeting and searches for and provides relevant information. [Explanation of Symbols]
[0940] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A generation means for generating dynamic representations that automate the progress of a meeting, The aforementioned dynamic representation is a means for collecting agenda items from meeting participants, A method for creating a content list based on the collected agenda items, The aforementioned dynamic representation serves as a means of providing information through dialogue with participants and facilitating the meeting. A generation means for analyzing audio data or document data during a meeting to generate a recorded text, A distribution method for delivering the generated record text to participants, A search method that retrieves past information and presents relevant materials during a meeting, A system that includes support tools to assist with suggestions during the content creation process.
2. The system according to claim 1, which sends electronic input to participants before a meeting and collects agenda items.
3. The system according to claim 1, in which a dynamic representation responds to participants' questions during a meeting, searches for and provides relevant information.