system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system automates meeting management by analyzing agenda importance, using virtual avatars for dialogue, and generating meeting minutes, addressing inefficiencies in meeting operations and improving participant engagement.

JP2026096477APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

Application Information

Patent Timeline

03 Dec 2024

Application

15 Jun 2026

Publication

JP2026096477A

IPC: G06Q10/02; G06Q10/00

AI Tagging

Application Domain

Reservations

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Unmanned Aerial Vehicle Airspace Reservation And Allocation System
US20260170966A1ReservationsRemote controlled aircraft
A method and system for dispatching a network car, a storage medium and an electronic device
CN115705503BReservations Commerce Time domain Simulation
An urban domestic waste management system and method based on the Internet of Things
CN122155182AReservations Office automation Data set The Internet
A method and computer program product for flight travel booking
CN122154981AImprove purchasing efficiencyImprove turnover rateVideo data browsing/visualisationReservations Simulation Computer program
system
JP2026103579AReservations ForecastingBehavioral historyExternal data

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In modern business environments, meeting operations require significant time and labor, especially for busy management staff, with challenges in efficiently collecting meeting topics, conducting meetings, and creating meeting minutes, and there is a need for quick and accurate information provision to facilitate smooth discussions.

Method used

A system that analyzes agenda items for importance, uses a virtual avatar for natural dialogue, performs real-time speech recognition, converts speech to text, extracts key points, and automatically generates meeting minutes, thereby automating the meeting management process.

Benefits of technology

This system significantly reduces the time and effort required for meeting management, allowing participants to focus on core discussions and ensures efficient, organized meetings with accurate information sharing.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096477000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] A means to analyze the agenda items received from participants and automatically create an order of the agenda items based on their importance, A means of conducting meetings using virtual avatars and engaging in natural conversations with participants, A method for performing real-time speech recognition on speeches during meetings, converting them into text data, and saving them, A method for extracting key points from text data after a meeting, automatically generating meeting minutes, and providing them to participants, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In the modern business environment, it has become a problem that a lot of time and labor are required for meeting operation. In particular, tasks such as collecting meeting topics, conducting the meeting, and creating meeting minutes are difficult to perform efficiently, and are a great burden especially for busy management staff and business persons. Furthermore, in order for participants to smoothly exchange opinions during a meeting, quick and accurate information provision is required, and it may be difficult to respond to this. Under such a background, there is an urgent need to provide a system that improves the efficiency of meeting operation, reduces the burden on participants, and promotes meaningful discussions.

Means for Solving the Problems

[0005] This invention provides a means for analyzing agenda items received from participants and automatically creating an order based on their importance. Furthermore, it enables the use of a virtual avatar equipped with a generation AI to facilitate meetings and engage in natural dialogue with participants. During the meeting, it includes means for real-time speech recognition of speeches, conversion into text data, and storage. After the meeting, it extracts key points from the text data, automatically generates meeting minutes, and provides them to participants. By automating this entire process, it improves the efficiency of meeting management and reduces the burden on participants.

[0006] "Participants" refer to people who attend a meeting and provide opinions and information on the agenda.

[0007] An "agenda" refers to a specific item or topic that should be taken up and discussed in a meeting.

[0008] "Importance" refers to the relative priority level that indicates how much an agenda item has an impact on the organization's objectives and requirements.

[0009] "Order" refers to the sequence or order in which agenda items are presented, indicating the priority in the progress of a meeting.

[0010] "Generative AI" refers to artificial intelligence that has the ability to understand and generate human language, and includes technologies for dialogue and text generation.

[0011] A "virtual avatar" refers to a digital character created by a computer that functions as an interface for interacting with the user.

[0012] "Speech recognition" refers to the technology that uses computers to analyze human speech and convert it into text data.

[0013] "Text data" refers to data that is converted into written information using speech recognition technology during a meeting.

[0014] "Meeting minutes" refers to a document that records the content discussed and decisions made at a meeting, and compiles them in a format that can be referenced later. [Brief explanation of the drawing]

[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of the data processing device and smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.

Embodiments for Carrying Out the Invention

[0016] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0017] First, the terms used in the following description will be explained.

[0018] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0019] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0020] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0023] [First Embodiment]

[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0036] To implement the present invention, a meeting management system is developed, and its main functions are implemented according to the following procedure.

[0037] First, the server collects agenda items from each user participating in the meeting. This process can be done via web forms or email. The server stores the received agenda items in a database and analyzes the content of each agenda item using natural language processing technology. Based on the analyzed data, it evaluates importance and lists the agenda items according to priority. For example, if the agenda items are "Sales Report" and "New Product Development," they will be evaluated for business priority and sorted accordingly.

[0038] Next, the terminal receives the organized agenda list sent from the server and prepares to conduct the meeting through a virtual avatar. In the meeting, the virtual avatar greets the participants in natural language and presents the first agenda item.

[0039] During a meeting, when a user speaks, the terminal converts their speech into text data in real time using speech recognition technology and sends it to the server. The server analyzes this text data, searches for relevant information as needed, and provides materials to aid the user's understanding.

[0040] Once the meeting concludes, the server automatically generates meeting minutes by extracting key discussion points from the saved text data. These minutes are formatted in a format such as PDF and distributed via email to all meeting participants. This facilitates smooth follow-up after the meeting.

[0041] The introduction of this system significantly reduces the time and effort required to manage meetings, allowing participants to focus on more core discussions. For example, using this system in weekly project meetings allows for discussions to proceed in the optimal order based on the importance of the agenda, enabling quick identification of key decisions.

[0042] The following describes the processing flow.

[0043] Step 1:

[0044] The user accesses a web form and enters the meeting agenda. Once the agenda is submitted, the data is sent to the server.

[0045] Step 2:

[0046] The server receives agenda items submitted by users and stores them in a database. Then, it analyzes the content of the agenda items using natural language processing techniques and extracts keywords.

[0047] Step 3:

[0048] The server evaluates the importance of the extracted keywords and creates a list of multiple agenda items sorted according to priority. This list will be used to guide the next meeting.

[0049] Step 4:

[0050] The terminal receives a prioritized agenda list sent from the server and prepares to set it up as the interface for the virtual avatar.

[0051] Step 5:

[0052] The terminal avatar greets participants at the start of the meeting and presents the first agenda item from a well-organized list. Participants speak in turn, and the avatar then takes up the next agenda item accordingly.

[0053] Step 6:

[0054] Each time the user speaks, the device uses speech recognition technology to convert the speech into text data in real time and sends it to the server.

[0055] Step 7:

[0056] The server analyzes text data and searches for information related to the content of the statements. This relevant information is returned to the terminal in real time and presented to participants through their avatars.

[0057] Step 8:

[0058] Once the meeting ends, the server uses all the text data saved during the meeting to extract the key points of the discussion and automatically creates meeting minutes.

[0059] Step 9:

[0060] The server formats the generated meeting minutes as an electronic document and sends it via email to all users who participated in the meeting. This allows participants to review the content discussed at the meeting later.

[0061] (Example 1)

[0062] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0063] In modern meeting management, setting agendas, managing the progress, recording conversations, and organizing information are cumbersome, hindering participants from focusing on important topics. In particular, prioritizing agenda items and quickly recording and analyzing discussions during meetings requires significant effort, highlighting the need for efficient meeting management.

[0064] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0065] In this invention, the server includes means for analyzing agenda items received from participants and automatically creating an order of agenda items based on their importance; means for conducting the meeting using virtual characters and engaging in natural dialogue with participants; and means for real-time speech recognition of speeches made during the meeting, converting them into text information, and saving it. This streamlines the operation of the meeting and allows participants to focus more on core discussions.

[0066] "Participants" refer to individuals or groups who propose agenda items and participate in discussions at a meeting.

[0067] An "agenda" refers to the topics or items that should be discussed during a meeting, and prioritizing them is crucial.

[0068] "Importance" refers to a criterion that indicates the relative value or priority of an agenda item or piece of information.

[0069] A "virtual person" refers to a computer-generated visual or auditory entity used to conduct meetings or interact with participants.

[0070] "Natural dialogue" refers to natural conversations between humans that are mimicked using technology.

[0071] "Real-time" refers to events or processes being processed or analyzed simultaneously with their execution.

[0072] "Speech recognition" refers to the technology or process of converting speech into text or data formats.

[0073] "Text information" refers to information expressed in character or data format.

[0074] A "report" refers to a document that summarizes and documents the discussions and results of a meeting.

[0075] "Communication equipment" refers to a machine or system used to send and receive information using digital or analog signals.

[0076] "Related information" refers to additional data or knowledge necessary to complement or support the discussions during the meeting.

[0077] To implement this invention, it is necessary to develop an information processing system to streamline meeting management. The system mainly includes a server, terminals, and a user interface. Users participating in the meeting submit agenda items via web forms or email. The server then automatically receives the agenda items and stores them in a database.

[0078] The server analyzes the agenda items using natural language processing techniques (such as libraries like NLTK and Spacy) and evaluates their importance. Once the evaluation is complete, the server lists the agenda items according to priority and sends this information to the terminal.

[0079] The terminal displays the agenda list received from the server through a virtual character, supporting the progress of the meeting. The virtual character uses technologies such as 3D animation and speech synthesis to welcome participants and enable natural dialogue.

[0080] When a user speaks during a meeting, the device uses speech recognition technology (such as Google® Cloud Speech-to-Text) to convert the speech into text in real time and sends that data to the server. The server has the capability to analyze the received text data and provide relevant information in real time.

[0081] After the meeting concludes, the server extracts key points from the saved text data and automatically generates a report. The report is formatted in a format such as PDF and distributed electronically to all participants via communication devices.

[0082] This system significantly reduces the time and effort required to run meetings, allowing participants to focus on more substantive discussions. For example, prompts for the generative AI model could include phrases like "Summarize the main points of the meeting" or "What should be the first topic addressed in the next meeting?"

[0083] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0084] Step 1:

[0085] Users submit meeting agendas via web form or email. The server receives these agendas and stores them in a database. Based on this input data, it formats them as text data as needed. For example, if a user enters "New Product Strategy Meeting" in the web form and clicks the submit button, that data is recorded on the server.

[0086] Step 2:

[0087] The server analyzes the stored agenda items using natural language processing technology. This process extracts keywords from the agenda text received as input data and scores their importance. The output generates pairs of agenda items and their importance scores. For example, "New Product Strategy Meeting" would be classified as "important."

[0088] Step 3:

[0089] The server sorts the agenda items in order of importance based on the analysis results. The agenda items and their scores obtained in step 2 are used as input data. The sorted list is sent to the terminal as output. Specifically, this "generates a list of agenda items in order of importance."

[0090] Step 4:

[0091] The terminal prepares to display the organized agenda list received from the server through a virtual character. It reads the agenda list received as input and outputs it to the virtual character's interface. Specifically, the virtual character appears on the screen and presents the agenda items for the "New Product Strategy Meeting" in order of importance.

[0092] Step 5:

[0093] When a user speaks during a meeting, the device uses speech recognition technology to convert the audio into text in real time. It receives audio as input and generates output as text data. For example, if the user says "What is the next step?", the text "What is the next step?" will be displayed.

[0094] Step 6:

[0095] The server receives text data sent from the terminal and searches for relevant information as needed. It analyzes the text data received as input and provides the user with relevant documents and database information as output. Specifically, it searches for historical data and statistical information related to the "new product strategy" and presents it to the user.

[0096] Step 7:

[0097] After the meeting ends, the server automatically generates a report by extracting key points from the saved text data. It uses the saved conversation content as input data and provides summarized information as output. Specifically, the report is created in a format such as, "The main conclusions of the new product strategy meeting were..."

[0098] Step 8:

[0099] The server organizes the generated report in a format such as PDF and distributes it electronically to all participants via communication devices. The generated report is used as input and output via email, etc. Specifically, a notification is sent stating, "The report from the new product strategy meeting has been sent via email."

[0100] (Application Example 1)

[0101] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0102] In community meetings, it is difficult for participants to efficiently discuss issues and make decisions smoothly. In particular, it is difficult to organize diverse opinions from participants in real time and to appropriately share important information. This can lead to prolonged meeting times and the oversight of important discussion points.

[0103] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0104] This invention includes a server that analyzes issues received from participants and automatically creates an order of issues based on their importance; a server that facilitates discussions using virtual characters and engages in natural conversations with participants; a server that performs real-time speech recognition of utterances during discussions, converts them into text data, and stores them; a server that allows local residents to share issues using smartphones, prioritize important decision-making matters, and support community meetings; and a server that extracts key points from the text data after the discussion and automatically generates minutes to provide to participants. This enables efficient organization of important issues in community meetings and facilitates quick and accurate decision-making.

[0105] "Participants" refer to the individual members who take part in a meeting or discussion.

[0106] An "issue" is a topic or theme discussed in a meeting or debate, or a problem that needs to be solved.

[0107] "Means for automatically creating sequences" refers to functions or processes that rearrange tasks in an appropriate order based on their importance.

[0108] A "virtual character" is a character created on a computer to assist in the progress of meetings and discussions.

[0109] "Means of facilitating discussions and conducting natural conversations" refers to techniques and methods that use virtual characters to facilitate the smooth progress of meetings and to engage in dialogue with participants.

[0110] "Speech recognition" is a technology that analyzes speech in real time and converts the speech information into text data.

[0111] "Methods for converting and saving as text data" refers to the process of converting information acquired through speech recognition into text format and saving it.

[0112] A "smartphone" is a multi-functional mobile phone, a small device capable of running applications.

[0113] "Important decision-making matters" are the points or conclusions that should be prioritized for discussion in a meeting.

[0114] "Means of supporting residents' meetings" refer to mechanisms and technologies that facilitate meetings and support participants' decision-making.

[0115] "Methods for automatically generating and providing meeting minutes to participants" refers to an automated process for summarizing the key points discussed after a meeting and presenting them to participants in an easily understandable format.

[0116] The server functions as the core of the local residents' meeting system, handling the backend for receiving and analyzing issues submitted by participants. Specifically, the Node.js-based server collects issue information submitted via web forms and email and stores it in MongoDB. The stored data is then analyzed using natural language processing libraries (such as NLTK and SpaCy) via Python scripts to evaluate the importance of the issues and automatically rank them.

[0117] The device, specifically a smartphone, will be equipped with a virtual character application to assist in the discussion. This application, developed with React Native, enables natural interaction with participants. During the meeting, participants' speech is collected via the smartphone's microphone and converted into text data in real time using the Google Cloud Speech-to-Text API. This text data is stored and analyzed on the server side.

[0118] Users share tasks via their smartphones and record their comments during meetings as data. After the meeting ends, the server uses natural language processing technology to extract key points of the discussion and generate meeting minutes. These minutes are formatted as PDFs and sent to participants via email, enabling efficient information sharing.

[0119] Specific example

[0120] For example, in a disaster response meeting, if attendees discuss "flood control measures" and "strengthening disaster prevention training," the system will prioritize these issues. Then, a virtual character will present each issue and support the discussion by analyzing participants' comments in real time.

[0121] A concrete example of a prompt message for the generating AI model would be, "Please summarize the results of the discussion on which issues should be prioritized in the next regional disaster prevention plan." This can be used to summarize the key points of a residents' meeting.

[0122] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0123] Step 1:

[0124] The server collects assignments received from participants via web forms or email. Raw assignment data submitted by users is sent to the server as input. The server stores this data in a database by saving it to MongoDB. The output here is that the assignment data is accurately stored in the database.

[0125] Step 2:

[0126] The server executes a Python script based on the stored assignment data and analyzes the assignments using natural language processing libraries (such as NLTK and SpaCy). The input is character data in the database, which is then analyzed to evaluate its importance. The output is a ranked list of assignments based on their importance.

[0127] Step 3:

[0128] The terminal receives the sequence information of the created tasks and uses a virtual character application developed with React Native to prepare for the discussion. Here, the input is the sequence information of the tasks sent from the server, and the output is the sequence of tasks to be presented to the participants.

[0129] Step 4:

[0130] Users speak during discussions via their smartphones. The device uses the Google Cloud Speech-to-Text API to convert speech into text in real time. Input is the user's speech, and output is text data. The converted text data is immediately sent to the server.

[0131] Step 5:

[0132] The server analyzes the received text data, collects relevant information in real time, and provides it to the terminal as needed. Real-time text data is used as input, and the output is fed back to the user as relevant materials and information.

[0133] Step 6:

[0134] After the discussion based on user comments has concluded, the server re-analyzes the entire meeting's text data and automatically extracts key discussion points. The input is the accumulated text data, and the output is the extracted points.

[0135] Step 7:

[0136] The server automatically generates meeting minutes based on the extracted key points and formats them in PDF format. This allows them to function as the final meeting report. As an export, the generated PDF file is created and distributed to participants via email.

[0137] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0138] To implement the present invention, it is necessary to incorporate an emotion engine into the meeting management system. This system includes a series of processes from agenda analysis to meeting progress using virtual avatars, speech recognition-based text conversion of meeting statements, and post-meeting minute generation.

[0139] First, the server receives agenda items from participants and uses natural language processing to determine their importance. This process extracts keywords from the agenda items and generates a sorted list based on them. Next, virtual avatars connected to terminals prepare to conduct the meeting using this data.

[0140] During the meeting, the terminal sequentially presents agenda items to participants via virtual avatars. This is where the emotion engine comes into play, analyzing the user's statements and facial expressions in real time to recognize their emotions. For example, if a user appears dissatisfied, the terminal transmits this information to the avatar, immediately adjusting the meeting's progress and the information presented.

[0141] Furthermore, speeches during the meeting are recognized by speech recognition and stored as text data on the server. After the meeting, the server analyzes the stored text data, extracting particularly important statements and decisions, and automatically creating meeting minutes. These minutes are then provided to participants via email or other communication methods.

[0142] By introducing an emotion engine, the system can respond flexibly to the meeting situation, stimulating discussions and facilitating smooth progress based on participants' emotions. For example, if some users are feeling confused, their avatars can provide additional explanations and follow up to help them understand. This is expected to significantly improve meeting efficiency and participant satisfaction.

[0143] The following describes the processing flow.

[0144] Step 1:

[0145] Users enter the meeting agenda via a dedicated web form and submit it to the server.

[0146] Step 2:

[0147] The server receives the input agenda items and analyzes them using natural language processing technology. Based on the analysis, it evaluates their importance, creates an agenda list in the optimal order, and saves it.

[0148] Step 3:

[0149] The terminal receives a prioritized agenda list sent from the server and sets it up in the virtual avatar's interface. The emotion engine is also prepared at this point.

[0150] Step 4:

[0151] The virtual avatar on the terminal starts the meeting, greets the participants, and then presents the first agenda item. The avatar is responsible for facilitating the meeting and controlling the transition to the next agenda item.

[0152] Step 5:

[0153] When a user speaks, the device converts the speech into text data using speech recognition technology and sends it to the server. Additionally, an emotion engine analyzes the user's facial expressions and tone of voice to evaluate the user's emotional state in real time.

[0154] Step 6:

[0155] The server analyzes the received text and sentiment data and provides relevant information to the terminal as needed. This information is presented to the user in real time through a virtual avatar during the meeting.

[0156] Step 7:

[0157] The emotion engine allows the avatar to adjust the meeting flow and information presentation based on the user's emotions. For example, if the user is having difficulty understanding something, it will add more detailed explanations.

[0158] Step 8:

[0159] After the meeting, the server automatically extracts key points from the discussion using all collected data and generates meeting minutes. These minutes include annotations based on sentiment analysis and significant changes in sentiment.

[0160] Step 9:

[0161] The server supports effective follow-up by distributing the generated meeting minutes to participants via email or other communication methods. This allows each participant to easily understand the content of the meeting.

[0162] (Example 2)

[0163] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0164] In today's world, meetings involving multiple participants, including those in remote locations, demand effective and smooth communication. However, it is difficult to immediately understand and respond to the emotions and nuances of participants' statements. Furthermore, quickly extracting key points from a vast amount of discussion and creating accurate meeting minutes is a time-consuming and laborious challenge.

[0165] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0166] In this invention, the server includes means for analyzing information received from participants and automatically creating an order of information based on its importance; means for conducting a meeting using a virtual dialogue subject and engaging in natural dialogue with users; and means for analyzing users' facial expressions to recognize their emotional state in real time and reflect this in the progress of the meeting. This makes it possible to quickly reflect the diverse emotions and statements of participants and efficiently extract and record important matters.

[0167] "Information received from participants" refers to the content and data provided by those attending the meeting, which become the subject of the meeting's agenda and discussion.

[0168] "Means for automatically creating information order based on importance" refers to a method that analyzes received information, evaluates the priority of its content, and automatically determines the order in which it is processed or presented.

[0169] A "virtual dialogue entity" is a computer-generated avatar or agent that facilitates discussions in a meeting and interacts naturally with participants.

[0170] "Means of natural dialogue with users" refers to communication technologies that enable participants and virtual dialogue subjects to communicate smoothly and appropriately, just as they would between humans.

[0171] "Methods for analyzing a user's facial expressions and recognizing their emotional state in real time" refers to methods that use cameras and sensors to capture a user's face and movements, and then use that information to instantly determine their current emotions.

[0172] "Means of influencing the progress of the meeting" refers to a system that utilizes the results of real-time analysis of emotions and statements to appropriately adjust the flow and content of the meeting.

[0173] To implement this invention, an integrated system for meeting management is used. Its specific configuration and process are described below.

[0174] The server receives information sent from participants and analyzes it. For natural language processing, it uses libraries such as NLTK or spaCy. This allows it to extract keywords from the received information, evaluate their importance, and automatically create an order suitable for the meeting's progress.

[0175] Next, the terminal facilitates the meeting through a virtual dialogue entity. This virtual dialogue entity is created using a real-time 3D platform such as Unity or Unreal Engine. The terminal controls the virtual dialogue entity based on data provided by the server, enabling natural conversation with the user. It is possible to analyze the user's emotional state in real time using OpenCV and common facial recognition APIs through the user's voice and camera input.

[0176] During meetings, to allow users to easily participate, terminals utilize speech recognition services such as Google Cloud Speech-to-Text and Amazon Transcribe to convert spoken content into text in real time. The text data is immediately recorded on the server.

[0177] Furthermore, after the meeting concludes, the server analyzes the recorded text data and uses the Python Pandas library to extract important statements and decisions. The resulting report is then distributed to participants via email.

[0178] As a concrete example, a possible prompt for a generative AI model might be, "Please highlight and explain the key points of the next agenda item." This prompt allows the system to generate actions that effectively support the progress of the meeting.

[0179] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0180] Step 1:

[0181] The server receives information from participants. It accepts information submitted via email or web forms as input. This information is processed using a natural language processing library to extract keywords. Specifically, the information is tokenized, part-of-speech tagged, and frequency analysis is used to list important words and phrases. This analysis results in an output that ranks the information based on its importance.

[0182] Step 2:

[0183] The terminal activates a virtual interactive entity and prepares for the meeting based on the ordered information received from the server. The input is the order information provided by the server. The virtual entity sets up the visuals and actions using a real-time 3D platform such as Unity. This prepares it for presentation to the user. The output is the generated meeting scenario.

[0184] Step 3:

[0185] The terminal presents information to the user through a virtual dialogue entity during the meeting. During the meeting, the terminal receives and uses the user's voice and video input. Sentiment analysis is performed using OpenCV or general facial recognition APIs, recognizing the user's emotional state in real time from their facial expressions. The analysis results are reflected in the virtual entity's actions. Outputs include adjusting the ongoing agenda and providing additional explanations.

[0186] Step 4:

[0187] The server converts the audio from the meeting into text using a speech recognition system such as Google Cloud Speech-to-Text. The input is a real-time audio signal recorded during the meeting. The speech recognition service outputs the audio as text data, which is recorded immediately. This ensures that the meeting content is saved in text format.

[0188] Step 5:

[0189] After the meeting ends, the server processes the accumulated text data and extracts important statements and decisions. The input is the text data saved in step 4. The data is structured and analyzed using the Python Pandas library to create a list of key items. The output is a structured report, which is distributed via email.

[0190] This series of processes enables users to conduct meetings efficiently and share information accurately.

[0191] (Application Example 2)

[0192] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0193] In recent years, smooth communication has become difficult in family conversations and family meetings due to differences in emotions and opinions among participants. In such situations, discussions often stall, particularly due to emotional misunderstandings, and the efficiency of the conversation decreases. To solve this problem, there is a need for a system that can grasp the emotions of participants in real time and respond appropriately.

[0194] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0195] In this invention, the server includes means for analyzing agenda items received from participants and automatically creating an order of agenda items based on their importance; means for conducting the meeting using virtual avatars and engaging in natural dialogue with participants; and means for analyzing participants' facial expressions and voice tone in real time and recognizing their emotions. This makes it possible to adjust the discussion according to the participants' emotions, resulting in smoother and more satisfying communication.

[0196] A "server" is a device that receives and processes data via a network and has the function of providing information to multiple users.

[0197] "Participants" refer to individual members who take part in a meeting or discussion and offer comments or opinions on the agenda.

[0198] An "agenda" refers to a specific topic or issue that should be addressed in a meeting or discussion, and it is prioritized for discussion based on its importance.

[0199] A "virtual avatar" is a human-like character created using digital technology, which interacts with the user and serves the role of presenting information.

[0200] "Speech recognition" refers to the technology that analyzes speech data to identify syllables and words and converts them into text data.

[0201] "Text data" refers to information in which speech or written text is encoded as characters, and is in a digital format that can be stored and searched.

[0202] "Emotion recognition" is a technology that identifies a person's emotional state from facial expressions, tone of voice, and other factors, and is used to adjust responses and feedback.

[0203] Meeting minutes are a record that includes summaries of statements and decisions made during a meeting, and are provided to participants in a format that can be referenced at a later date.

[0204] "Communication means" refers to technical devices and methods for sending and receiving data, and aims to improve the efficiency of information transmission.

[0205] This invention aims to build a meeting management system that incorporates emotion recognition to facilitate smooth communication within the home. Using a home robot as hardware, and leveraging emotion engines and voice recognition technology, it supports conversations and family meetings within the home.

[0206] The server analyzes the agenda items received from participants based on data transmitted from home robots and automatically sets their order based on importance. This process uses a natural language processing library to extract keywords from the agenda items and determine their priority.

[0207] The terminal, a home robot, is equipped with a virtual avatar that facilitates natural conversation. The robot recognizes participants' voices in real time and converts them into text data using the Google Cloud Speech-to-Text API. This text data is sent to a server where key points are further extracted.

[0208] Furthermore, the system uses an emotion recognition API like Affectiva to identify the emotional state that users exhibit during family meetings, and the home robot then provides feedback and additional information based on that data. For example, if a toddler shows signs of boredom while the family is discussing holiday plans, the robot can ask, "Shall I suggest a new activity?"

[0209] For example, when a family is discussing whether or not to welcome a new pet into their home, the robot could suggest, "Shall I look up some information about pet care?" An example of a prompt to the generating AI model in this case would be, "If the family is hesitant about getting a pet, how would you present information that could help resolve the issue?"

[0210] This system is expected to reduce misunderstandings in conversations and enable communication that satisfies all participants.

[0211] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0212] Step 1:

[0213] The server receives agenda data from a home robot. The received data is analyzed using a natural language processing library to extract keywords. This determines the importance of the agenda items and generates an ordered list. This list is then sent to the next step.

[0214] Step 2:

[0215] A home robot, acting as a terminal, prepares to conduct a meeting using a virtual avatar based on an agenda list received from a server. The avatar presents the agenda to participants and initiates a natural conversation. It receives voice input from the user, and that voice data is sent to the next step.

[0216] Step 3:

[0217] When a user speaks, the device sends the audio data in real time to the Google Cloud Speech-to-Text API, where it is converted into text data. This text data is stored on the server. The converted text is then used in the next feedback step.

[0218] Step 4:

[0219] The device analyzes the user's facial expressions and voice tone through emotion recognition APIs such as Affectiva. This analysis determines the user's emotional state. The results are immediately transmitted to the virtual avatar, which then provides conversational feedback and information tailored to the user.

[0220] Step 5:

[0221] Once the meeting concludes, the server analyzes the saved text data and extracts important statements and decisions. The extracted content is automatically generated as meeting minutes. These minutes are then distributed in the next step.

[0222] Step 6:

[0223] The terminal distributes the generated meeting minutes to participants via email or other communication methods. This allows users to review the meeting content and prepare for future meetings.

[0224] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0225] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0226] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0227] [Second Embodiment]

[0228] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0229] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0230] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0231] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0232] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0233] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0234] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0235] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0236] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0237] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0238] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0239] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0240] To implement the present invention, a meeting management system is developed, and its main functions are implemented according to the following procedure.

[0241] First, the server collects agenda items from each user participating in the meeting. This process can be done via web forms or email. The server stores the received agenda items in a database and analyzes the content of each agenda item using natural language processing technology. Based on the analyzed data, it evaluates importance and lists the agenda items according to priority. For example, if the agenda items are "Sales Report" and "New Product Development," they will be evaluated for business priority and sorted accordingly.

[0242] Next, the terminal receives the organized agenda list sent from the server and prepares to conduct the meeting through a virtual avatar. In the meeting, the virtual avatar greets the participants in natural language and presents the first agenda item.

[0243] During a meeting, when a user speaks, the terminal converts their speech into text data in real time using speech recognition technology and sends it to the server. The server analyzes this text data, searches for relevant information as needed, and provides materials to aid the user's understanding.

[0244] Once the meeting concludes, the server automatically generates meeting minutes by extracting key discussion points from the saved text data. These minutes are formatted in a format such as PDF and distributed via email to all meeting participants. This facilitates smooth follow-up after the meeting.

[0245] The introduction of this system significantly reduces the time and effort required to manage meetings, allowing participants to focus on more core discussions. For example, using this system in weekly project meetings allows for discussions to proceed in the optimal order based on the importance of the agenda, enabling quick identification of key decisions.

[0246] The following describes the processing flow.

[0247] Step 1:

[0248] The user accesses a web form and enters the meeting agenda. Once the agenda is submitted, the data is sent to the server.

[0249] Step 2:

[0250] The server receives agenda items submitted by users and stores them in a database. Then, it analyzes the content of the agenda items using natural language processing techniques and extracts keywords.

[0251] Step 3:

[0252] The server evaluates the importance of the extracted keywords and creates a list of multiple agenda items sorted according to priority. This list will be used to guide the next meeting.

[0253] Step 4:

[0254] The terminal receives a prioritized agenda list sent from the server and prepares to set it up as the interface for the virtual avatar.

[0255] Step 5:

[0256] The terminal avatar greets participants at the start of the meeting and presents the first agenda item from a well-organized list. Participants speak in turn, and the avatar then takes up the next agenda item accordingly.

[0257] Step 6:

[0258] Each time the user speaks, the device uses speech recognition technology to convert the speech into text data in real time and sends it to the server.

[0259] Step 7:

[0260] The server analyzes text data and searches for information related to the content of the statements. This relevant information is returned to the terminal in real time and presented to participants through their avatars.

[0261] Step 8:

[0262] Once the meeting ends, the server uses all the text data saved during the meeting to extract the key points of the discussion and automatically creates meeting minutes.

[0263] Step 9:

[0264] The server formats the generated meeting minutes as an electronic document and sends it via email to all users who participated in the meeting. This allows participants to review the content discussed at the meeting later.

[0265] (Example 1)

[0266] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0267] In modern meeting management, setting agendas, managing the progress, recording conversations, and organizing information are cumbersome, hindering participants from focusing on important topics. In particular, prioritizing agenda items and quickly recording and analyzing discussions during meetings requires significant effort, highlighting the need for efficient meeting management.

[0268] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0269] In this invention, the server includes means for analyzing agenda items received from participants and automatically creating an order of agenda items based on their importance; means for conducting the meeting using virtual characters and engaging in natural dialogue with participants; and means for real-time speech recognition of speeches made during the meeting, converting them into text information, and saving it. This streamlines the operation of the meeting and allows participants to focus more on core discussions.

[0270] "Participants" refer to individuals or groups who propose agenda items and participate in discussions at a meeting.

[0271] An "agenda" refers to the topics or items that should be discussed during a meeting, and prioritizing them is crucial.

[0272] "Importance" refers to a criterion that indicates the relative value or priority of an agenda item or piece of information.

[0273] A "virtual person" refers to a computer-generated visual or auditory entity used to conduct meetings or interact with participants.

[0274] "Natural dialogue" refers to natural conversations between humans that are mimicked using technology.

[0275] "Real-time" refers to events or processes being processed or analyzed simultaneously with their execution.

[0276] "Speech recognition" refers to the technology or process of converting speech into text or data formats.

[0277] "Text information" refers to information expressed in character or data format.

[0278] A "report" refers to a document that summarizes and documents the discussions and results of a meeting.

[0279] "Communication equipment" refers to a machine or system used to send and receive information using digital or analog signals.

[0280] "Related information" refers to additional data or knowledge necessary to complement or support the discussions during the meeting.

[0281] To implement this invention, it is necessary to develop an information processing system to streamline meeting management. The system mainly includes a server, terminals, and a user interface. Users participating in the meeting submit agenda items via web forms or email. The server then automatically receives the agenda items and stores them in a database.

[0282] The server analyzes the agenda items using natural language processing techniques (such as libraries like NLTK and Spacy) and evaluates their importance. Once the evaluation is complete, the server lists the agenda items according to priority and sends this information to the terminal.

[0283] The terminal displays the list of topics received from the server by virtual characters to support the progress of the meeting. The virtual characters use, for example, 3D animation and speech synthesis technologies to welcome participants and realize natural conversations.

[0284] When the user speaks during the meeting, the terminal uses speech recognition technology (such as Google Cloud Speech-to-Text) to convert the speech into text in real time and send the data to the server. The server has the function of analyzing the received text data and providing relevant information in real time.

[0285] After the meeting ends, the server extracts important points using the saved text data and automatically generates a report. The report is formatted in a form such as PDF and electronically distributed to all participants through the communication device.

[0286] By using this system, the time and labor required for meeting operation are significantly reduced, and participants can concentrate on more essential discussions. As specific examples, the prompt sentences for the generative AI model can be "Please summarize the main results of the meeting" or "What are the issues to be taken up first in the next meeting?"

[0287] The flow of the specific process in Example 1 will be described using FIG. 11.

[0288] Step 1:

[0289] The user submits the topics of the meeting via a web form or email. The server receives this topic and saves it in the database. Based on this input data, the format as text data is adjusted as needed. As a specific operation, when the user enters "New Product Strategy Meeting" in the web form and presses the send button, the data is recorded on the server.

[0290] Step 2:

[0291] The server analyzes the stored agenda items using natural language processing technology. This process extracts keywords from the agenda text received as input data and scores their importance. The output generates pairs of agenda items and their importance scores. For example, "New Product Strategy Meeting" would be classified as "important."

[0292] Step 3:

[0293] The server sorts the agenda items in order of importance based on the analysis results. The agenda items and their scores obtained in step 2 are used as input data. The sorted list is sent to the terminal as output. Specifically, this "generates a list of agenda items in order of importance."

[0294] Step 4:

[0295] The terminal prepares to display the organized agenda list received from the server through a virtual character. It reads the agenda list received as input and outputs it to the virtual character's interface. Specifically, the virtual character appears on the screen and presents the agenda items for the "New Product Strategy Meeting" in order of importance.

[0296] Step 5:

[0297] When a user speaks during a meeting, the device uses speech recognition technology to convert the audio into text in real time. It receives audio as input and generates output as text data. For example, if the user says "What is the next step?", the text "What is the next step?" will be displayed.

[0298] Step 6:

[0299] The server receives text data sent from the terminal and searches for relevant information as needed. It analyzes the text data received as input and provides the user with relevant documents and database information as output. Specifically, it searches for historical data and statistical information related to the "new product strategy" and presents it to the user.

[0300] Step 7:

[0301] After the meeting, based on the saved text data, the server extracts important points and automatically generates a report. Using the conversation content saved as input data, it provides summary information as output. Specifically, a report is created in the form of "The main conclusions in the new product strategy meeting are...".

[0302] Step 8:

[0303] The server formats the generated report in a format such as PDF and electronically distributes it to all participants through the communication device. Adopting the generated report as input, it outputs it via email or the like. Specifically, a notification such as "The report of the new product strategy meeting has been sent by email" is given.

[0304] (Application Example 1)

[0305] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0306] In a local residents' meeting, it is difficult for participants to efficiently discuss issues and make decisions smoothly. In particular, it is difficult to organize various opinions from participants in real time and appropriately share important information. As a result, there is a possibility of prolonging the meeting time and overlooking important discussion points.

[0307] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means respectively.

[0308] This invention includes a server that analyzes issues received from participants and automatically creates an order of issues based on their importance; a server that facilitates discussions using virtual characters and engages in natural conversations with participants; a server that performs real-time speech recognition of utterances during discussions, converts them into text data, and stores them; a server that allows local residents to share issues using smartphones, prioritize important decision-making matters, and support community meetings; and a server that extracts key points from the text data after the discussion and automatically generates minutes to provide to participants. This enables efficient organization of important issues in community meetings and facilitates quick and accurate decision-making.

[0309] "Participants" refer to the individual members who take part in a meeting or discussion.

[0310] An "issue" is a topic or theme discussed in a meeting or debate, or a problem that needs to be solved.

[0311] "Means for automatically creating sequences" refers to functions or processes that rearrange tasks in an appropriate order based on their importance.

[0312] A "virtual character" is a character created on a computer to assist in the progress of meetings and discussions.

[0313] "Means of facilitating discussions and conducting natural conversations" refers to techniques and methods that use virtual characters to facilitate the smooth progress of meetings and to engage in dialogue with participants.

[0314] "Speech recognition" is a technology that analyzes speech in real time and converts the speech information into text data.

[0315] "Methods for converting and saving as text data" refers to the process of converting information acquired through speech recognition into text format and saving it.

[0316] A "smartphone" is a multi-functional mobile phone, a small device capable of running applications.

[0317] "Important decision-making matters" are the points or conclusions that should be prioritized for discussion in a meeting.

[0318] "Means of supporting residents' meetings" refer to mechanisms and technologies that facilitate meetings and support participants' decision-making.

[0319] "Methods for automatically generating and providing meeting minutes to participants" refers to an automated process for summarizing the key points discussed after a meeting and presenting them to participants in an easily understandable format.

[0320] The server functions as the core of the local residents' meeting system, handling the backend for receiving and analyzing issues submitted by participants. Specifically, the Node.js-based server collects issue information submitted via web forms and email and stores it in MongoDB. The stored data is then analyzed using natural language processing libraries (such as NLTK and SpaCy) via Python scripts to evaluate the importance of the issues and automatically rank them.

[0321] The device, specifically a smartphone, will be equipped with a virtual character application to assist in the discussion. This application, developed with React Native, enables natural interaction with participants. During the meeting, participants' speech is collected via the smartphone's microphone and converted into text data in real time using the Google Cloud Speech-to-Text API. This text data is stored and analyzed on the server side.

[0322] Users share tasks via their smartphones and record their comments during meetings as data. After the meeting ends, the server uses natural language processing technology to extract key points of the discussion and generate meeting minutes. These minutes are formatted as PDFs and sent to participants via email, enabling efficient information sharing.

[0323] Specific example

[0324] For example, in a disaster response meeting, if attendees discuss "flood control measures" and "strengthening disaster prevention training," the system will prioritize these issues. Then, a virtual character will present each issue and support the discussion by analyzing participants' comments in real time.

[0325] A concrete example of a prompt message for the generating AI model would be, "Please summarize the results of the discussion on which issues should be prioritized in the next regional disaster prevention plan." This can be used to summarize the key points of a residents' meeting.

[0326] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0327] Step 1:

[0328] The server collects assignments received from participants via web forms or email. Raw assignment data submitted by users is sent to the server as input. The server stores this data in a database by saving it to MongoDB. The output here is that the assignment data is accurately stored in the database.

[0329] Step 2:

[0330] The server executes a Python script based on the stored assignment data and analyzes the assignments using natural language processing libraries (such as NLTK and SpaCy). The input is character data in the database, which is then analyzed to evaluate its importance. The output is a ranked list of assignments based on their importance.

[0331] Step 3:

[0332] The terminal receives the sequence information of the created tasks and uses a virtual character application developed with React Native to prepare for the discussion. Here, the input is the sequence information of the tasks sent from the server, and the output is the sequence of tasks to be presented to the participants.

[0333] Step 4:

[0334] Users speak during discussions via their smartphones. The device uses the Google Cloud Speech-to-Text API to convert speech into text in real time. Input is the user's speech, and output is text data. The converted text data is immediately sent to the server.

[0335] Step 5:

[0336] The server analyzes the received text data, collects relevant information in real time, and provides it to the terminal as needed. Real-time text data is used as input, and the output is fed back to the user as relevant materials and information.

[0337] Step 6:

[0338] After the discussion based on user comments has concluded, the server re-analyzes the entire meeting's text data and automatically extracts key discussion points. The input is the accumulated text data, and the output is the extracted points.

[0339] Step 7:

[0340] The server automatically generates meeting minutes based on the extracted key points and formats them in PDF format. This allows them to function as the final meeting report. As an export, the generated PDF file is created and distributed to participants via email.

[0341] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0342] To implement the present invention, it is necessary to incorporate an emotion engine into the meeting management system. This system includes a series of processes from agenda analysis to meeting progress using virtual avatars, speech recognition-based text conversion of meeting statements, and post-meeting minute generation.

[0343] First, the server receives agenda items from participants and uses natural language processing to determine their importance. This process extracts keywords from the agenda items and generates a sorted list based on them. Next, virtual avatars connected to terminals prepare to conduct the meeting using this data.

[0344] During the meeting, the terminal sequentially presents agenda items to participants via virtual avatars. This is where the emotion engine comes into play, analyzing the user's statements and facial expressions in real time to recognize their emotions. For example, if a user appears dissatisfied, the terminal transmits this information to the avatar, immediately adjusting the meeting's progress and the information presented.

[0345] Furthermore, speeches during the meeting are recognized by speech recognition and stored as text data on the server. After the meeting, the server analyzes the stored text data, extracting particularly important statements and decisions, and automatically creating meeting minutes. These minutes are then provided to participants via email or other communication methods.

[0346] By introducing an emotion engine, the system can respond flexibly to the meeting situation, stimulating discussions and facilitating smooth progress based on participants' emotions. For example, if some users are feeling confused, their avatars can provide additional explanations and follow up to help them understand. This is expected to significantly improve meeting efficiency and participant satisfaction.

[0347] The following describes the processing flow.

[0348] Step 1:

[0349] Users enter the meeting agenda via a dedicated web form and submit it to the server.

[0350] Step 2:

[0351] The server receives the input agenda items and analyzes them using natural language processing technology. Based on the analysis, it evaluates their importance, creates an agenda list in the optimal order, and saves it.

[0352] Step 3:

[0353] The terminal receives a prioritized agenda list sent from the server and sets it up in the virtual avatar's interface. The emotion engine is also prepared at this point.

[0354] Step 4:

[0355] The virtual avatar on the terminal starts the meeting, greets the participants, and then presents the first agenda item. The avatar is responsible for facilitating the meeting and controlling the transition to the next agenda item.

[0356] Step 5:

[0357] When a user speaks, the device converts the speech into text data using speech recognition technology and sends it to the server. Additionally, an emotion engine analyzes the user's facial expressions and tone of voice to evaluate the user's emotional state in real time.

[0358] Step 6:

[0359] The server analyzes the received text and sentiment data and provides relevant information to the terminal as needed. This information is presented to the user in real time through a virtual avatar during the meeting.

[0360] Step 7:

[0361] The emotion engine allows the avatar to adjust the meeting flow and information presentation based on the user's emotions. For example, if the user is having difficulty understanding something, it will add more detailed explanations.

[0362] Step 8:

[0363] After the meeting, the server automatically extracts key points from the discussion using all collected data and generates meeting minutes. These minutes include annotations based on sentiment analysis and significant changes in sentiment.

[0364] Step 9:

[0365] The server supports effective follow-up by distributing the generated meeting minutes to participants via email or other communication methods. This allows each participant to easily understand the content of the meeting.

[0366] (Example 2)

[0367] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0368] In today's world, meetings involving multiple participants, including those in remote locations, demand effective and smooth communication. However, it is difficult to immediately understand and respond to the emotions and nuances of participants' statements. Furthermore, quickly extracting key points from a vast amount of discussion and creating accurate meeting minutes is a time-consuming and laborious challenge.

[0369] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0370] In this invention, the server includes means for analyzing information received from participants and automatically creating an order of information based on its importance; means for conducting a meeting using a virtual dialogue subject and engaging in natural dialogue with users; and means for analyzing users' facial expressions to recognize their emotional state in real time and reflect this in the progress of the meeting. This makes it possible to quickly reflect the diverse emotions and statements of participants and efficiently extract and record important matters.

[0371] "Information received from participants" refers to the content and data provided by those attending the meeting, which become the subject of the meeting's agenda and discussion.

[0372] "Means for automatically creating information order based on importance" refers to a method that analyzes received information, evaluates the priority of its content, and automatically determines the order in which it is processed or presented.

[0373] A "virtual dialogue entity" is a computer-generated avatar or agent that facilitates discussions in a meeting and interacts naturally with participants.

[0374] "Means of natural dialogue with users" refers to communication technologies that enable participants and virtual dialogue subjects to communicate smoothly and appropriately, just as they would between humans.

[0375] "Methods for analyzing a user's facial expressions and recognizing their emotional state in real time" refers to methods that use cameras and sensors to capture a user's face and movements, and then use that information to instantly determine their current emotions.

[0376] "Means of influencing the progress of the meeting" refers to a system that utilizes the results of real-time analysis of emotions and statements to appropriately adjust the flow and content of the meeting.

[0377] To implement this invention, an integrated system for meeting management is used. Its specific configuration and process are described below.

[0378] The server receives information sent from participants and analyzes it. For natural language processing, it uses libraries such as NLTK or spaCy. This allows it to extract keywords from the received information, evaluate their importance, and automatically create an order suitable for the meeting's progress.

[0379] Next, the terminal facilitates the meeting through a virtual dialogue entity. This virtual dialogue entity is created using a real-time 3D platform such as Unity or Unreal Engine. The terminal controls the virtual dialogue entity based on data provided by the server, enabling natural conversation with the user. It is possible to analyze the user's emotional state in real time using OpenCV and common facial recognition APIs through the user's voice and camera input.

[0380] During meetings, to allow users to easily participate, terminals utilize speech recognition services such as Google Cloud Speech-to-Text and Amazon Transcribe to convert spoken content into text in real time. The text data is immediately recorded on the server.

[0381] Furthermore, after the meeting concludes, the server analyzes the recorded text data and uses the Python Pandas library to extract important statements and decisions. The resulting report is then distributed to participants via email.

[0382] As a concrete example, a possible prompt for a generative AI model might be, "Please highlight and explain the key points of the next agenda item." This prompt allows the system to generate actions that effectively support the progress of the meeting.

[0383] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0384] Step 1:

[0385] The server receives information from participants. It accepts information submitted via email or web forms as input. This information is processed using a natural language processing library to extract keywords. Specifically, the information is tokenized, part-of-speech tagged, and frequency analysis is used to list important words and phrases. This analysis results in an output that ranks the information based on its importance.

[0386] Step 2:

[0387] The terminal activates a virtual interactive entity and prepares for the meeting based on the ordered information received from the server. The input is the order information provided by the server. The virtual entity sets up the visuals and actions using a real-time 3D platform such as Unity. This prepares it for presentation to the user. The output is the generated meeting scenario.

[0388] Step 3:

[0389] The terminal presents information to the user through a virtual dialogue entity during the meeting. During the meeting, the terminal receives and uses the user's voice and video input. Sentiment analysis is performed using OpenCV or general facial recognition APIs, recognizing the user's emotional state in real time from their facial expressions. The analysis results are reflected in the virtual entity's actions. Outputs include adjusting the ongoing agenda and providing additional explanations.

[0390] Step 4:

[0391] The server converts the audio from the meeting into text using a speech recognition system such as Google Cloud Speech-to-Text. The input is a real-time audio signal recorded during the meeting. The speech recognition service outputs the audio as text data, which is recorded immediately. This ensures that the meeting content is saved in text format.

[0392] Step 5:

[0393] After the meeting ends, the server processes the accumulated text data and extracts important statements and decisions. The input is the text data saved in step 4. The data is structured and analyzed using the Python Pandas library to create a list of key items. The output is a structured report, which is distributed via email.

[0394] This series of processes enables users to conduct meetings efficiently and share information accurately.

[0395] (Application Example 2)

[0396] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".

[0397] In recent years, smooth communication has become difficult in family conversations and family meetings due to differences in emotions and opinions among participants. In such situations, discussions often stall, particularly due to emotional misunderstandings, and the efficiency of the conversation decreases. To solve this problem, there is a need for a system that can grasp the emotions of participants in real time and respond appropriately.

[0398] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0399] In this invention, the server includes means for analyzing agenda items received from participants and automatically creating an order of agenda items based on their importance; means for conducting the meeting using virtual avatars and engaging in natural dialogue with participants; and means for analyzing participants' facial expressions and voice tone in real time and recognizing their emotions. This makes it possible to adjust the discussion according to the participants' emotions, resulting in smoother and more satisfying communication.

[0400] A "server" is a device that receives and processes data via a network and has the function of providing information to multiple users.

[0401] "Participants" refer to individual members who take part in a meeting or discussion and offer comments or opinions on the agenda.

[0402] An "agenda" refers to a specific topic or issue that should be addressed in a meeting or discussion, and it is prioritized for discussion based on its importance.

[0403] A "virtual avatar" is a human-like character created using digital technology, which interacts with the user and serves the role of presenting information.

[0404] "Speech recognition" refers to the technology that analyzes speech data to identify syllables and words and converts them into text data.

[0405] "Text data" refers to information in which speech or written text is encoded as characters, and is in a digital format that can be stored and searched.

[0406] "Emotion recognition" is a technology that identifies a person's emotional state from facial expressions, tone of voice, and other factors, and is used to adjust responses and feedback.

[0407] Meeting minutes are a record that includes summaries of statements and decisions made during a meeting, and are provided to participants in a format that can be referenced at a later date.

[0408] "Communication means" refers to technical devices and methods for sending and receiving data, and aims to improve the efficiency of information transmission.

[0409] This invention aims to build a meeting management system that incorporates emotion recognition to facilitate smooth communication within the home. Using a home robot as hardware, and leveraging emotion engines and voice recognition technology, it supports conversations and family meetings within the home.

[0410] The server analyzes the agenda items received from participants based on data transmitted from home robots and automatically sets their order based on importance. This process uses a natural language processing library to extract keywords from the agenda items and determine their priority.

[0411] The terminal, a home robot, is equipped with a virtual avatar that facilitates natural conversation. The robot recognizes participants' voices in real time and converts them into text data using the Google Cloud Speech-to-Text API. This text data is sent to a server where key points are further extracted.

[0412] Furthermore, the system uses an emotion recognition API like Affectiva to identify the emotional state that users exhibit during family meetings, and the home robot then provides feedback and additional information based on that data. For example, if a toddler shows signs of boredom while the family is discussing holiday plans, the robot can ask, "Shall I suggest a new activity?"

[0413] For example, when a family is discussing whether or not to welcome a new pet into their home, the robot could suggest, "Shall I look up some information about pet care?" An example of a prompt to the generating AI model in this case would be, "If the family is hesitant about getting a pet, how would you present information that could help resolve the issue?"

[0414] This system is expected to reduce misunderstandings in conversations and enable communication that satisfies all participants.

[0415] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0416] Step 1:

[0417] The server receives agenda data from a home robot. The received data is analyzed using a natural language processing library to extract keywords. This determines the importance of the agenda items and generates an ordered list. This list is then sent to the next step.

[0418] Step 2:

[0419] A home robot, acting as a terminal, prepares to conduct a meeting using a virtual avatar based on an agenda list received from a server. The avatar presents the agenda to participants and initiates a natural conversation. It receives voice input from the user, and that voice data is sent to the next step.

[0420] Step 3:

[0421] When a user speaks, the device sends the audio data in real time to the Google Cloud Speech-to-Text API, where it is converted into text data. This text data is stored on the server. The converted text is then used in the next feedback step.

[0422] Step 4:

[0423] The device analyzes the user's facial expressions and voice tone through emotion recognition APIs such as Affectiva. This analysis determines the user's emotional state. The results are immediately transmitted to the virtual avatar, which then provides conversational feedback and information tailored to the user.

[0424] Step 5:

[0425] Once the meeting concludes, the server analyzes the saved text data and extracts important statements and decisions. The extracted content is automatically generated as meeting minutes. These minutes are then distributed in the next step.

[0426] Step 6:

[0427] The terminal distributes the generated meeting minutes to participants via email or other communication methods. This allows users to review the meeting content and prepare for future meetings.

[0428] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0429] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0430] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0431] [Third Embodiment]

[0432] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0433] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0434] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0435] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0436] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0437] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0438] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0439] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0440] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0441] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0442] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0443] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0444] To implement the present invention, a meeting management system is developed, and its main functions are implemented according to the following procedure.

[0445] First, the server collects agenda items from each user participating in the meeting. This process can be done via web forms or email. The server stores the received agenda items in a database and analyzes the content of each agenda item using natural language processing technology. Based on the analyzed data, it evaluates importance and lists the agenda items according to priority. For example, if the agenda items are "Sales Report" and "New Product Development," they will be evaluated for business priority and sorted accordingly.

[0446] Next, the terminal receives the organized agenda list sent from the server and prepares to conduct the meeting through a virtual avatar. In the meeting, the virtual avatar greets the participants in natural language and presents the first agenda item.

[0447] During a meeting, when a user speaks, the terminal converts their speech into text data in real time using speech recognition technology and sends it to the server. The server analyzes this text data, searches for relevant information as needed, and provides materials to aid the user's understanding.

[0448] Once the meeting concludes, the server automatically generates meeting minutes by extracting key discussion points from the saved text data. These minutes are formatted in a format such as PDF and distributed via email to all meeting participants. This facilitates smooth follow-up after the meeting.

[0449] The introduction of this system significantly reduces the time and effort required to manage meetings, allowing participants to focus on more core discussions. For example, using this system in weekly project meetings allows for discussions to proceed in the optimal order based on the importance of the agenda, enabling quick identification of key decisions.

[0450] The following describes the processing flow.

[0451] Step 1:

[0452] The user accesses a web form and enters the meeting agenda. Once the agenda is submitted, the data is sent to the server.

[0453] Step 2:

[0454] The server receives agenda items submitted by users and stores them in a database. Then, it analyzes the content of the agenda items using natural language processing techniques and extracts keywords.

[0455] Step 3:

[0456] The server evaluates the importance of the extracted keywords and creates a list of multiple agenda items sorted according to priority. This list will be used to guide the next meeting.

[0457] Step 4:

[0458] The terminal receives a prioritized agenda list sent from the server and prepares to set it up as the interface for the virtual avatar.

[0459] Step 5:

[0460] The terminal avatar greets participants at the start of the meeting and presents the first agenda item from a well-organized list. Participants speak in turn, and the avatar then takes up the next agenda item accordingly.

[0461] Step 6:

[0462] Each time the user speaks, the device uses speech recognition technology to convert the speech into text data in real time and sends it to the server.

[0463] Step 7:

[0464] The server analyzes text data and searches for information related to the content of the statements. This relevant information is returned to the terminal in real time and presented to participants through their avatars.

[0465] Step 8:

[0466] Once the meeting ends, the server uses all the text data saved during the meeting to extract the key points of the discussion and automatically creates meeting minutes.

[0467] Step 9:

[0468] The server formats the generated meeting minutes as an electronic document and sends it via email to all users who participated in the meeting. This allows participants to review the content discussed at the meeting later.

[0469] (Example 1)

[0470] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0471] In modern meeting management, setting agendas, managing the progress, recording conversations, and organizing information are cumbersome, hindering participants from focusing on important topics. In particular, prioritizing agenda items and quickly recording and analyzing discussions during meetings requires significant effort, highlighting the need for efficient meeting management.

[0472] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0473] In this invention, the server includes means for analyzing agenda items received from participants and automatically creating an order of agenda items based on their importance; means for conducting the meeting using virtual characters and engaging in natural dialogue with participants; and means for real-time speech recognition of speeches made during the meeting, converting them into text information, and saving it. This streamlines the operation of the meeting and allows participants to focus more on core discussions.

[0474] "Participants" refer to individuals or groups who propose agenda items and participate in discussions at a meeting.

[0475] An "agenda" refers to the topics or items that should be discussed during a meeting, and prioritizing them is crucial.

[0476] "Importance" refers to a criterion that indicates the relative value or priority of an agenda item or piece of information.

[0477] A "virtual person" refers to a computer-generated visual or auditory entity used to conduct meetings or interact with participants.

[0478] "Natural dialogue" refers to natural conversations between humans that are mimicked using technology.

[0479] "Real-time" refers to events or processes being processed or analyzed simultaneously with their execution.

[0480] "Speech recognition" refers to the technology or process of converting speech into text or data formats.

[0481] "Text information" refers to information expressed in character or data format.

[0482] A "report" refers to a document that summarizes and documents the discussions and results of a meeting.

[0483] "Communication equipment" refers to a machine or system used to send and receive information using digital or analog signals.

[0484] "Related information" refers to additional data or knowledge necessary to complement or support the discussions during the meeting.

[0485] To implement this invention, it is necessary to develop an information processing system to streamline meeting management. The system mainly includes a server, terminals, and a user interface. Users participating in the meeting submit agenda items via web forms or email. The server then automatically receives the agenda items and stores them in a database.

[0486] The server analyzes the agenda items using natural language processing techniques (such as libraries like NLTK and Spacy) and evaluates their importance. Once the evaluation is complete, the server lists the agenda items according to priority and sends this information to the terminal.

[0487] The terminal displays the agenda list received from the server through a virtual character, supporting the progress of the meeting. The virtual character uses technologies such as 3D animation and speech synthesis to welcome participants and enable natural dialogue.

[0488] When a user speaks during a meeting, the device uses speech recognition technology (such as Google Cloud Speech-to-Text) to convert the speech into text in real time and sends that data to the server. The server then analyzes the received text data and provides relevant information in real time.

[0489] After the meeting concludes, the server extracts key points from the saved text data and automatically generates a report. The report is formatted in a format such as PDF and distributed electronically to all participants via communication devices.

[0490] This system significantly reduces the time and effort required to run meetings, allowing participants to focus on more substantive discussions. For example, prompts for the generative AI model could include phrases like "Summarize the main points of the meeting" or "What should be the first topic addressed in the next meeting?"

[0491] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0492] Step 1:

[0493] Users submit meeting agendas via web form or email. The server receives these agendas and stores them in a database. Based on this input data, it formats them as text data as needed. For example, if a user enters "New Product Strategy Meeting" in the web form and clicks the submit button, that data is recorded on the server.

[0494] Step 2:

[0495] The server analyzes the stored agenda items using natural language processing technology. This process extracts keywords from the agenda text received as input data and scores their importance. The output generates pairs of agenda items and their importance scores. For example, "New Product Strategy Meeting" would be classified as "important."

[0496] Step 3:

[0497] The server sorts the agenda items in order of importance based on the analysis results. The agenda items and their scores obtained in step 2 are used as input data. The sorted list is sent to the terminal as output. Specifically, this "generates a list of agenda items in order of importance."

[0498] Step 4:

[0499] The terminal prepares to display the organized agenda list received from the server through a virtual character. It reads the agenda list received as input and outputs it to the virtual character's interface. Specifically, the virtual character appears on the screen and presents the agenda items for the "New Product Strategy Meeting" in order of importance.

[0500] Step 5:

[0501] When a user speaks during a meeting, the device uses speech recognition technology to convert the audio into text in real time. It receives audio as input and generates output as text data. For example, if the user says "What is the next step?", the text "What is the next step?" will be displayed.

[0502] Step 6:

[0503] The server receives text data sent from the terminal and searches for relevant information as needed. It analyzes the text data received as input and provides the user with relevant documents and database information as output. Specifically, it searches for historical data and statistical information related to the "new product strategy" and presents it to the user.

[0504] Step 7:

[0505] After the meeting ends, the server automatically generates a report by extracting key points from the saved text data. It uses the saved conversation content as input data and provides summarized information as output. Specifically, the report is created in a format such as, "The main conclusions of the new product strategy meeting were..."

[0506] Step 8:

[0507] The server organizes the generated report in a format such as PDF and distributes it electronically to all participants via communication devices. The generated report is used as input and output via email, etc. Specifically, a notification is sent stating, "The report from the new product strategy meeting has been sent via email."

[0508] (Application Example 1)

[0509] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0510] In community meetings, it is difficult for participants to efficiently discuss issues and make decisions smoothly. In particular, it is difficult to organize diverse opinions from participants in real time and to appropriately share important information. This can lead to prolonged meeting times and the oversight of important discussion points.

[0511] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0512] This invention includes a server that analyzes issues received from participants and automatically creates an order of issues based on their importance; a server that facilitates discussions using virtual characters and engages in natural conversations with participants; a server that performs real-time speech recognition of utterances during discussions, converts them into text data, and stores them; a server that allows local residents to share issues using smartphones, prioritize important decision-making matters, and support community meetings; and a server that extracts key points from the text data after the discussion and automatically generates minutes to provide to participants. This enables efficient organization of important issues in community meetings and facilitates quick and accurate decision-making.

[0513] "Participants" refer to the individual members who take part in a meeting or discussion.

[0514] An "issue" is a topic or theme discussed in a meeting or debate, or a problem that needs to be solved.

[0515] "Means for automatically creating sequences" refers to functions or processes that rearrange tasks in an appropriate order based on their importance.

[0516] A "virtual character" is a character created on a computer to assist in the progress of meetings and discussions.

[0517] "Means of facilitating discussions and conducting natural conversations" refers to techniques and methods that use virtual characters to facilitate the smooth progress of meetings and to engage in dialogue with participants.

[0518] "Speech recognition" is a technology that analyzes speech in real time and converts the speech information into text data.

[0519] "Methods for converting and saving as text data" refers to the process of converting information acquired through speech recognition into text format and saving it.

[0520] A "smartphone" is a multi-functional mobile phone, a small device capable of running applications.

[0521] "Important decision-making matters" are the points or conclusions that should be prioritized for discussion in a meeting.

[0522] "Means of supporting residents' meetings" refer to mechanisms and technologies that facilitate meetings and support participants' decision-making.

[0523] "Methods for automatically generating and providing meeting minutes to participants" refers to an automated process for summarizing the key points discussed after a meeting and presenting them to participants in an easily understandable format.

[0524] The server functions as the core of the local residents' meeting system, handling the backend for receiving and analyzing issues submitted by participants. Specifically, the Node.js-based server collects issue information submitted via web forms and email and stores it in MongoDB. The stored data is then analyzed using natural language processing libraries (such as NLTK and SpaCy) via Python scripts to evaluate the importance of the issues and automatically rank them.

[0525] The device, specifically a smartphone, will be equipped with a virtual character application to assist in the discussion. This application, developed with React Native, enables natural interaction with participants. During the meeting, participants' speech is collected via the smartphone's microphone and converted into text data in real time using the Google Cloud Speech-to-Text API. This text data is stored and analyzed on the server side.

[0526] Users share tasks via their smartphones and record their comments during meetings as data. After the meeting ends, the server uses natural language processing technology to extract key points of the discussion and generate meeting minutes. These minutes are formatted as PDFs and sent to participants via email, enabling efficient information sharing.

[0527] Specific example

[0528] For example, in a disaster response meeting, if attendees discuss "flood control measures" and "strengthening disaster prevention training," the system will prioritize these issues. Then, a virtual character will present each issue and support the discussion by analyzing participants' comments in real time.

[0529] A concrete example of a prompt message for the generating AI model would be, "Please summarize the results of the discussion on which issues should be prioritized in the next regional disaster prevention plan." This can be used to summarize the key points of a residents' meeting.

[0530] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0531] Step 1:

[0532] The server collects assignments received from participants via web forms or email. Raw assignment data submitted by users is sent to the server as input. The server stores this data in a database by saving it to MongoDB. The output here is that the assignment data is accurately stored in the database.

[0533] Step 2:

[0534] The server executes a Python script based on the stored assignment data and analyzes the assignments using natural language processing libraries (such as NLTK and SpaCy). The input is character data in the database, which is then analyzed to evaluate its importance. The output is a ranked list of assignments based on their importance.

[0535] Step 3:

[0536] The terminal receives the sequence information of the created tasks and uses a virtual character application developed with React Native to prepare for the discussion. Here, the input is the sequence information of the tasks sent from the server, and the output is the sequence of tasks to be presented to the participants.

[0537] Step 4:

[0538] Users speak during discussions via their smartphones. The device uses the Google Cloud Speech-to-Text API to convert speech into text in real time. Input is the user's speech, and output is text data. The converted text data is immediately sent to the server.

[0539] Step 5:

[0540] The server analyzes the received text data, collects relevant information in real time, and provides it to the terminal as needed. Real-time text data is used as input, and the output is fed back to the user as relevant materials and information.

[0541] Step 6:

[0542] After the discussion based on user comments has concluded, the server re-analyzes the entire meeting's text data and automatically extracts key discussion points. The input is the accumulated text data, and the output is the extracted points.

[0543] Step 7:

[0544] The server automatically generates meeting minutes based on the extracted key points and formats them in PDF format. This allows them to function as the final meeting report. As an export, the generated PDF file is created and distributed to participants via email.

[0545] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0546] To implement the present invention, it is necessary to incorporate an emotion engine into the meeting management system. This system includes a series of processes from agenda analysis to meeting progress using virtual avatars, speech recognition-based text conversion of meeting statements, and post-meeting minute generation.

[0547] First, the server receives agenda items from participants and uses natural language processing to determine their importance. This process extracts keywords from the agenda items and generates a sorted list based on them. Next, virtual avatars connected to terminals prepare to conduct the meeting using this data.

[0548] During the meeting, the terminal sequentially presents agenda items to participants via virtual avatars. This is where the emotion engine comes into play, analyzing the user's statements and facial expressions in real time to recognize their emotions. For example, if a user appears dissatisfied, the terminal transmits this information to the avatar, immediately adjusting the meeting's progress and the information presented.

[0549] Furthermore, speeches during the meeting are recognized by speech recognition and stored as text data on the server. After the meeting, the server analyzes the stored text data, extracting particularly important statements and decisions, and automatically creating meeting minutes. These minutes are then provided to participants via email or other communication methods.

[0550] By introducing an emotion engine, the system can respond flexibly to the meeting situation, stimulating discussions and facilitating smooth progress based on participants' emotions. For example, if some users are feeling confused, their avatars can provide additional explanations and follow up to help them understand. This is expected to significantly improve meeting efficiency and participant satisfaction.

[0551] The following describes the processing flow.

[0552] Step 1:

[0553] Users enter the meeting agenda via a dedicated web form and submit it to the server.

[0554] Step 2:

[0555] The server receives the input agenda items and analyzes them using natural language processing technology. Based on the analysis, it evaluates their importance, creates an agenda list in the optimal order, and saves it.

[0556] Step 3:

[0557] The terminal receives a prioritized agenda list sent from the server and sets it up in the virtual avatar's interface. The emotion engine is also prepared at this point.

[0558] Step 4:

[0559] The virtual avatar on the terminal starts the meeting, greets the participants, and then presents the first agenda item. The avatar is responsible for facilitating the meeting and controlling the transition to the next agenda item.

[0560] Step 5:

[0561] When a user speaks, the device converts the speech into text data using speech recognition technology and sends it to the server. Additionally, an emotion engine analyzes the user's facial expressions and tone of voice to evaluate the user's emotional state in real time.

[0562] Step 6:

[0563] The server analyzes the received text and sentiment data and provides relevant information to the terminal as needed. This information is presented to the user in real time through a virtual avatar during the meeting.

[0564] Step 7:

[0565] The emotion engine allows the avatar to adjust the meeting flow and information presentation based on the user's emotions. For example, if the user is having difficulty understanding something, it will add more detailed explanations.

[0566] Step 8:

[0567] After the meeting, the server automatically extracts key points from the discussion using all collected data and generates meeting minutes. These minutes include annotations based on sentiment analysis and significant changes in sentiment.

[0568] Step 9:

[0569] The server supports effective follow-up by distributing the generated meeting minutes to participants via email or other communication methods. This allows each participant to easily understand the content of the meeting.

[0570] (Example 2)

[0571] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0572] In today's world, meetings involving multiple participants, including those in remote locations, demand effective and smooth communication. However, it is difficult to immediately understand and respond to the emotions and nuances of participants' statements. Furthermore, quickly extracting key points from a vast amount of discussion and creating accurate meeting minutes is a time-consuming and laborious challenge.

[0573] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0574] In this invention, the server includes means for analyzing information received from participants and automatically creating an order of information based on its importance; means for conducting a meeting using a virtual dialogue subject and engaging in natural dialogue with users; and means for analyzing users' facial expressions to recognize their emotional state in real time and reflect this in the progress of the meeting. This makes it possible to quickly reflect the diverse emotions and statements of participants and efficiently extract and record important matters.

[0575] "Information received from participants" refers to the content and data provided by those attending the meeting, which become the subject of the meeting's agenda and discussion.

[0576] "Means for automatically creating information order based on importance" refers to a method that analyzes received information, evaluates the priority of its content, and automatically determines the order in which it is processed or presented.

[0577] A "virtual dialogue entity" is a computer-generated avatar or agent that facilitates discussions in a meeting and interacts naturally with participants.

[0578] "Means of natural dialogue with users" refers to communication technologies that enable participants and virtual dialogue subjects to communicate smoothly and appropriately, just as they would between humans.

[0579] "Methods for analyzing a user's facial expressions and recognizing their emotional state in real time" refers to methods that use cameras and sensors to capture a user's face and movements, and then use that information to instantly determine their current emotions.

[0580] "Means of influencing the progress of the meeting" refers to a system that utilizes the results of real-time analysis of emotions and statements to appropriately adjust the flow and content of the meeting.

[0581] To implement this invention, an integrated system for meeting management is used. Its specific configuration and process are described below.

[0582] The server receives information sent from participants and analyzes it. For natural language processing, it uses libraries such as NLTK or spaCy. This allows it to extract keywords from the received information, evaluate their importance, and automatically create an order suitable for the meeting's progress.

[0583] Next, the terminal facilitates the meeting through a virtual dialogue entity. This virtual dialogue entity is created using a real-time 3D platform such as Unity or Unreal Engine. The terminal controls the virtual dialogue entity based on data provided by the server, enabling natural conversation with the user. It is possible to analyze the user's emotional state in real time using OpenCV and common facial recognition APIs through the user's voice and camera input.

[0584] During meetings, to allow users to easily participate, terminals utilize speech recognition services such as Google Cloud Speech-to-Text and Amazon Transcribe to convert spoken content into text in real time. The text data is immediately recorded on the server.

[0585] Furthermore, after the meeting concludes, the server analyzes the recorded text data and uses the Python Pandas library to extract important statements and decisions. The resulting report is then distributed to participants via email.

[0586] As a concrete example, a possible prompt for a generative AI model might be, "Please highlight and explain the key points of the next agenda item." This prompt allows the system to generate actions that effectively support the progress of the meeting.

[0587] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0588] Step 1:

[0589] The server receives information from participants. It accepts information submitted via email or web forms as input. This information is processed using a natural language processing library to extract keywords. Specifically, the information is tokenized, part-of-speech tagged, and frequency analysis is used to list important words and phrases. This analysis results in an output that ranks the information based on its importance.

[0590] Step 2:

[0591] The terminal activates a virtual interactive entity and prepares for the meeting based on the ordered information received from the server. The input is the order information provided by the server. The virtual entity sets up the visuals and actions using a real-time 3D platform such as Unity. This prepares it for presentation to the user. The output is the generated meeting scenario.

[0592] Step 3:

[0593] The terminal presents information to the user through a virtual dialogue entity during the meeting. During the meeting, the terminal receives and uses the user's voice and video input. Sentiment analysis is performed using OpenCV or general facial recognition APIs, recognizing the user's emotional state in real time from their facial expressions. The analysis results are reflected in the virtual entity's actions. Outputs include adjusting the ongoing agenda and providing additional explanations.

[0594] Step 4:

[0595] The server converts the audio from the meeting into text using a speech recognition system such as Google Cloud Speech-to-Text. The input is a real-time audio signal recorded during the meeting. The speech recognition service outputs the audio as text data, which is recorded immediately. This ensures that the meeting content is saved in text format.

[0596] Step 5:

[0597] After the meeting ends, the server processes the accumulated text data and extracts important statements and decisions. The input is the text data saved in step 4. The data is structured and analyzed using the Python Pandas library to create a list of key items. The output is a structured report, which is distributed via email.

[0598] This series of processes enables users to conduct meetings efficiently and share information accurately.

[0599] (Application Example 2)

[0600] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0601] In recent years, smooth communication has become difficult in family conversations and family meetings due to differences in emotions and opinions among participants. In such situations, discussions often stall, particularly due to emotional misunderstandings, and the efficiency of the conversation decreases. To solve this problem, there is a need for a system that can grasp the emotions of participants in real time and respond appropriately.

[0602] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0603] In this invention, the server includes means for analyzing agenda items received from participants and automatically creating an order of agenda items based on their importance; means for conducting the meeting using virtual avatars and engaging in natural dialogue with participants; and means for analyzing participants' facial expressions and voice tone in real time and recognizing their emotions. This makes it possible to adjust the discussion according to the participants' emotions, resulting in smoother and more satisfying communication.

[0604] A "server" is a device that receives and processes data via a network and has the function of providing information to multiple users.

[0605] "Participants" refer to individual members who take part in a meeting or discussion and offer comments or opinions on the agenda.

[0606] An "agenda" refers to a specific topic or issue that should be addressed in a meeting or discussion, and it is prioritized for discussion based on its importance.

[0607] A "virtual avatar" is a human-like character created using digital technology, which interacts with the user and serves the role of presenting information.

[0608] "Speech recognition" refers to the technology that analyzes speech data to identify syllables and words and converts them into text data.

[0609] "Text data" refers to information in which speech or written text is encoded as characters, and is in a digital format that can be stored and searched.

[0610] "Emotion recognition" is a technology that identifies a person's emotional state from facial expressions, tone of voice, and other factors, and is used to adjust responses and feedback.

[0611] Meeting minutes are a record that includes summaries of statements and decisions made during a meeting, and are provided to participants in a format that can be referenced at a later date.

[0612] "Communication means" refers to technical devices and methods for sending and receiving data, and aims to improve the efficiency of information transmission.

[0613] This invention aims to build a meeting management system that incorporates emotion recognition to facilitate smooth communication within the home. Using a home robot as hardware, and leveraging emotion engines and voice recognition technology, it supports conversations and family meetings within the home.

[0614] The server analyzes the agenda items received from participants based on data transmitted from home robots and automatically sets their order based on importance. This process uses a natural language processing library to extract keywords from the agenda items and determine their priority.

[0615] The terminal, a home robot, is equipped with a virtual avatar that facilitates natural conversation. The robot recognizes participants' voices in real time and converts them into text data using the Google Cloud Speech-to-Text API. This text data is sent to a server where key points are further extracted.

[0616] Furthermore, the system uses an emotion recognition API like Affectiva to identify the emotional state that users exhibit during family meetings, and the home robot then provides feedback and additional information based on that data. For example, if a toddler shows signs of boredom while the family is discussing holiday plans, the robot can ask, "Shall I suggest a new activity?"

[0617] For example, when a family is discussing whether or not to welcome a new pet into their home, the robot could suggest, "Shall I look up some information about pet care?" An example of a prompt to the generating AI model in this case would be, "If the family is hesitant about getting a pet, how would you present information that could help resolve the issue?"

[0618] This system is expected to reduce misunderstandings in conversations and enable communication that satisfies all participants.

[0619] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0620] Step 1:

[0621] The server receives agenda data from a home robot. The received data is analyzed using a natural language processing library to extract keywords. This determines the importance of the agenda items and generates an ordered list. This list is then sent to the next step.

[0622] Step 2:

[0623] A home robot, acting as a terminal, prepares to conduct a meeting using a virtual avatar based on an agenda list received from a server. The avatar presents the agenda to participants and initiates a natural conversation. It receives voice input from the user, and that voice data is sent to the next step.

[0624] Step 3:

[0625] When a user speaks, the device sends the audio data in real time to the Google Cloud Speech-to-Text API, where it is converted into text data. This text data is stored on the server. The converted text is then used in the next feedback step.

[0626] Step 4:

[0627] The device analyzes the user's facial expressions and voice tone through emotion recognition APIs such as Affectiva. This analysis determines the user's emotional state. The results are immediately transmitted to the virtual avatar, which then provides conversational feedback and information tailored to the user.

[0628] Step 5:

[0629] Once the meeting concludes, the server analyzes the saved text data and extracts important statements and decisions. The extracted content is automatically generated as meeting minutes. These minutes are then distributed in the next step.

[0630] Step 6:

[0631] The terminal distributes the generated meeting minutes to participants via email or other communication methods. This allows users to review the meeting content and prepare for future meetings.

[0632] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0633] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0634] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0635] [Fourth Embodiment]

[0636] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0637] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0638] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0639] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0640] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0641] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0642] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0643] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0644] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0645] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0646] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0647] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0648] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0649] To implement the present invention, a meeting management system is developed, and its main functions are implemented according to the following procedure.

[0650] First, the server collects agenda items from each user participating in the meeting. This process can be done via web forms or email. The server stores the received agenda items in a database and analyzes the content of each agenda item using natural language processing technology. Based on the analyzed data, it evaluates importance and lists the agenda items according to priority. For example, if the agenda items are "Sales Report" and "New Product Development," they will be evaluated for business priority and sorted accordingly.

[0651] Next, the terminal receives the organized agenda list sent from the server and prepares to conduct the meeting through a virtual avatar. In the meeting, the virtual avatar greets the participants in natural language and presents the first agenda item.

[0652] During a meeting, when a user speaks, the terminal converts their speech into text data in real time using speech recognition technology and sends it to the server. The server analyzes this text data, searches for relevant information as needed, and provides materials to aid the user's understanding.

[0653] Once the meeting concludes, the server automatically generates meeting minutes by extracting key discussion points from the saved text data. These minutes are formatted in a format such as PDF and distributed via email to all meeting participants. This facilitates smooth follow-up after the meeting.

[0654] The introduction of this system significantly reduces the time and effort required to manage meetings, allowing participants to focus on more core discussions. For example, using this system in weekly project meetings allows for discussions to proceed in the optimal order based on the importance of the agenda, enabling quick identification of key decisions.

[0655] The following describes the processing flow.

[0656] Step 1:

[0657] The user accesses a web form and enters the meeting agenda. Once the agenda is submitted, the data is sent to the server.

[0658] Step 2:

[0659] The server receives agenda items submitted by users and stores them in a database. Then, it analyzes the content of the agenda items using natural language processing techniques and extracts keywords.

[0660] Step 3:

[0661] The server evaluates the importance of the extracted keywords and creates a list of multiple agenda items sorted according to priority. This list will be used to guide the next meeting.

[0662] Step 4:

[0663] The terminal receives a prioritized agenda list sent from the server and prepares to set it up as the interface for the virtual avatar.

[0664] Step 5:

[0665] The terminal avatar greets participants at the start of the meeting and presents the first agenda item from a well-organized list. Participants speak in turn, and the avatar then takes up the next agenda item accordingly.

[0666] Step 6:

[0667] Each time the user speaks, the device uses speech recognition technology to convert the speech into text data in real time and sends it to the server.

[0668] Step 7:

[0669] The server analyzes text data and searches for information related to the content of the statements. This relevant information is returned to the terminal in real time and presented to participants through their avatars.

[0670] Step 8:

[0671] Once the meeting ends, the server uses all the text data saved during the meeting to extract the key points of the discussion and automatically creates meeting minutes.

[0672] Step 9:

[0673] The server formats the generated meeting minutes as an electronic document and sends it via email to all users who participated in the meeting. This allows participants to review the content discussed at the meeting later.

[0674] (Example 1)

[0675] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0676] In modern meeting management, setting agendas, managing the progress, recording conversations, and organizing information are cumbersome, hindering participants from focusing on important topics. In particular, prioritizing agenda items and quickly recording and analyzing discussions during meetings requires significant effort, highlighting the need for efficient meeting management.

[0677] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0678] In this invention, the server includes means for analyzing agenda items received from participants and automatically creating an order of agenda items based on their importance; means for conducting the meeting using virtual characters and engaging in natural dialogue with participants; and means for real-time speech recognition of speeches made during the meeting, converting them into text information, and saving it. This streamlines the operation of the meeting and allows participants to focus more on core discussions.

[0679] "Participants" refer to individuals or groups who propose agenda items and participate in discussions at a meeting.

[0680] An "agenda" refers to the topics or items that should be discussed during a meeting, and prioritizing them is crucial.

[0681] "Importance" refers to a criterion that indicates the relative value or priority of an agenda item or piece of information.

[0682] A "virtual person" refers to a computer-generated visual or auditory entity used to conduct meetings or interact with participants.

[0683] "Natural dialogue" refers to natural conversations between humans that are mimicked using technology.

[0684] "Real-time" refers to events or processes being processed or analyzed simultaneously with their execution.

[0685] "Speech recognition" refers to the technology or process of converting speech into text or data formats.

[0686] "Text information" refers to information expressed in character or data format.

[0687] A "report" refers to a document that summarizes and documents the discussions and results of a meeting.

[0688] "Communication equipment" refers to a machine or system used to send and receive information using digital or analog signals.

[0689] "Related information" refers to additional data or knowledge necessary to complement or support the discussions during the meeting.

[0690] To implement this invention, it is necessary to develop an information processing system to streamline meeting management. The system mainly includes a server, terminals, and a user interface. Users participating in the meeting submit agenda items via web forms or email. The server then automatically receives the agenda items and stores them in a database.

[0691] The server analyzes the agenda items using natural language processing techniques (such as libraries like NLTK and Spacy) and evaluates their importance. Once the evaluation is complete, the server lists the agenda items according to priority and sends this information to the terminal.

[0692] The terminal displays the agenda list received from the server through a virtual character, supporting the progress of the meeting. The virtual character uses technologies such as 3D animation and speech synthesis to welcome participants and enable natural dialogue.

[0693] When a user speaks during a meeting, the device uses speech recognition technology (such as Google Cloud Speech-to-Text) to convert the speech into text in real time and sends that data to the server. The server then analyzes the received text data and provides relevant information in real time.

[0694] After the meeting concludes, the server extracts key points from the saved text data and automatically generates a report. The report is formatted in a format such as PDF and distributed electronically to all participants via communication devices.

[0695] This system significantly reduces the time and effort required to run meetings, allowing participants to focus on more substantive discussions. For example, prompts for the generative AI model could include phrases like "Summarize the main points of the meeting" or "What should be the first topic addressed in the next meeting?"

[0696] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0697] Step 1:

[0698] Users submit meeting agendas via web form or email. The server receives these agendas and stores them in a database. Based on this input data, it formats them as text data as needed. For example, if a user enters "New Product Strategy Meeting" in the web form and clicks the submit button, that data is recorded on the server.

[0699] Step 2:

[0700] The server analyzes the stored agenda items using natural language processing technology. This process extracts keywords from the agenda text received as input data and scores their importance. The output generates pairs of agenda items and their importance scores. For example, "New Product Strategy Meeting" would be classified as "important."

[0701] Step 3:

[0702] The server sorts the agenda items in order of importance based on the analysis results. The agenda items and their scores obtained in step 2 are used as input data. The sorted list is sent to the terminal as output. Specifically, this "generates a list of agenda items in order of importance."

[0703] Step 4:

[0704] The terminal prepares to display the organized agenda list received from the server through a virtual character. It reads the agenda list received as input and outputs it to the virtual character's interface. Specifically, the virtual character appears on the screen and presents the agenda items for the "New Product Strategy Meeting" in order of importance.

[0705] Step 5:

[0706] When a user speaks during a meeting, the device uses speech recognition technology to convert the audio into text in real time. It receives audio as input and generates output as text data. For example, if the user says "What is the next step?", the text "What is the next step?" will be displayed.

[0707] Step 6:

[0708] The server receives text data sent from the terminal and searches for relevant information as needed. It analyzes the text data received as input and provides the user with relevant documents and database information as output. Specifically, it searches for historical data and statistical information related to the "new product strategy" and presents it to the user.

[0709] Step 7:

[0710] After the meeting ends, the server automatically generates a report by extracting key points from the saved text data. It uses the saved conversation content as input data and provides summarized information as output. Specifically, the report is created in a format such as, "The main conclusions of the new product strategy meeting were..."

[0711] Step 8:

[0712] The server organizes the generated report in a format such as PDF and distributes it electronically to all participants via communication devices. The generated report is used as input and output via email, etc. Specifically, a notification is sent stating, "The report from the new product strategy meeting has been sent via email."

[0713] (Application Example 1)

[0714] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0715] In community meetings, it is difficult for participants to efficiently discuss issues and make decisions smoothly. In particular, it is difficult to organize diverse opinions from participants in real time and to appropriately share important information. This can lead to prolonged meeting times and the oversight of important discussion points.

[0716] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0717] This invention includes a server that analyzes issues received from participants and automatically creates an order of issues based on their importance; a server that facilitates discussions using virtual characters and engages in natural conversations with participants; a server that performs real-time speech recognition of utterances during discussions, converts them into text data, and stores them; a server that allows local residents to share issues using smartphones, prioritize important decision-making matters, and support community meetings; and a server that extracts key points from the text data after the discussion and automatically generates minutes to provide to participants. This enables efficient organization of important issues in community meetings and facilitates quick and accurate decision-making.

[0718] "Participants" refer to the individual members who take part in a meeting or discussion.

[0719] An "issue" is a topic or theme discussed in a meeting or debate, or a problem that needs to be solved.

[0720] "Means for automatically creating sequences" refers to functions or processes that rearrange tasks in an appropriate order based on their importance.

[0721] A "virtual character" is a character created on a computer to assist in the progress of meetings and discussions.

[0722] "Means of facilitating discussions and conducting natural conversations" refers to techniques and methods that use virtual characters to facilitate the smooth progress of meetings and to engage in dialogue with participants.

[0723] "Speech recognition" is a technology that analyzes speech in real time and converts the speech information into text data.

[0724] "Methods for converting and saving as text data" refers to the process of converting information acquired through speech recognition into text format and saving it.

[0725] A "smartphone" is a multi-functional mobile phone, a small device capable of running applications.

[0726] "Important decision-making matters" are the points or conclusions that should be prioritized for discussion in a meeting.

[0727] "Means of supporting residents' meetings" refer to mechanisms and technologies that facilitate meetings and support participants' decision-making.

[0728] "Methods for automatically generating and providing meeting minutes to participants" refers to an automated process for summarizing the key points discussed after a meeting and presenting them to participants in an easily understandable format.

[0729] The server functions as the core of the local residents' meeting system, handling the backend for receiving and analyzing issues submitted by participants. Specifically, the Node.js-based server collects issue information submitted via web forms and email and stores it in MongoDB. The stored data is then analyzed using natural language processing libraries (such as NLTK and SpaCy) via Python scripts to evaluate the importance of the issues and automatically rank them.

[0730] The device, specifically a smartphone, will be equipped with a virtual character application to assist in the discussion. This application, developed with React Native, enables natural interaction with participants. During the meeting, participants' speech is collected via the smartphone's microphone and converted into text data in real time using the Google Cloud Speech-to-Text API. This text data is stored and analyzed on the server side.

[0731] Users share tasks via their smartphones and record their comments during meetings as data. After the meeting ends, the server uses natural language processing technology to extract key points of the discussion and generate meeting minutes. These minutes are formatted as PDFs and sent to participants via email, enabling efficient information sharing.

[0732] Specific example

[0733] For example, in a disaster response meeting, if attendees discuss "flood control measures" and "strengthening disaster prevention training," the system will prioritize these issues. Then, a virtual character will present each issue and support the discussion by analyzing participants' comments in real time.

[0734] A concrete example of a prompt message for the generating AI model would be, "Please summarize the results of the discussion on which issues should be prioritized in the next regional disaster prevention plan." This can be used to summarize the key points of a residents' meeting.

[0735] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0736] Step 1:

[0737] The server collects assignments received from participants via web forms or email. Raw assignment data submitted by users is sent to the server as input. The server stores this data in a database by saving it to MongoDB. The output here is that the assignment data is accurately stored in the database.

[0738] Step 2:

[0739] The server executes a Python script based on the stored assignment data and analyzes the assignments using natural language processing libraries (such as NLTK and SpaCy). The input is character data in the database, which is then analyzed to evaluate its importance. The output is a ranked list of assignments based on their importance.

[0740] Step 3:

[0741] The terminal receives the sequence information of the created tasks and uses a virtual character application developed with React Native to prepare for the discussion. Here, the input is the sequence information of the tasks sent from the server, and the output is the sequence of tasks to be presented to the participants.

[0742] Step 4:

[0743] Users speak during discussions via their smartphones. The device uses the Google Cloud Speech-to-Text API to convert speech into text in real time. Input is the user's speech, and output is text data. The converted text data is immediately sent to the server.

[0744] Step 5:

[0745] The server analyzes the received text data, collects relevant information in real time, and provides it to the terminal as needed. Real-time text data is used as input, and the output is fed back to the user as relevant materials and information.

[0746] Step 6:

[0747] After the discussion based on user comments has concluded, the server re-analyzes the entire meeting's text data and automatically extracts key discussion points. The input is the accumulated text data, and the output is the extracted points.

[0748] Step 7:

[0749] The server automatically generates meeting minutes based on the extracted key points and formats them in PDF format. This allows them to function as the final meeting report. As an export, the generated PDF file is created and distributed to participants via email.

[0750] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0751] To implement the present invention, it is necessary to incorporate an emotion engine into the meeting management system. This system includes a series of processes from agenda analysis to meeting progress using virtual avatars, speech recognition-based text conversion of meeting statements, and post-meeting minute generation.

[0752] First, the server receives agenda items from participants and uses natural language processing to determine their importance. This process extracts keywords from the agenda items and generates a sorted list based on them. Next, virtual avatars connected to terminals prepare to conduct the meeting using this data.

[0753] During the meeting, the terminal sequentially presents agenda items to participants via virtual avatars. This is where the emotion engine comes into play, analyzing the user's statements and facial expressions in real time to recognize their emotions. For example, if a user appears dissatisfied, the terminal transmits this information to the avatar, immediately adjusting the meeting's progress and the information presented.

[0754] Furthermore, speeches during the meeting are recognized by speech recognition and stored as text data on the server. After the meeting, the server analyzes the stored text data, extracting particularly important statements and decisions, and automatically creating meeting minutes. These minutes are then provided to participants via email or other communication methods.

[0755] By introducing an emotion engine, the system can respond flexibly to the meeting situation, stimulating discussions and facilitating smooth progress based on participants' emotions. For example, if some users are feeling confused, their avatars can provide additional explanations and follow up to help them understand. This is expected to significantly improve meeting efficiency and participant satisfaction.

[0756] The following describes the processing flow.

[0757] Step 1:

[0758] Users enter the meeting agenda via a dedicated web form and submit it to the server.

[0759] Step 2:

[0760] The server receives the input agenda items and analyzes them using natural language processing technology. Based on the analysis, it evaluates their importance, creates an agenda list in the optimal order, and saves it.

[0761] Step 3:

[0762] The terminal receives a prioritized agenda list sent from the server and sets it up in the virtual avatar's interface. The emotion engine is also prepared at this point.

[0763] Step 4:

[0764] The virtual avatar on the terminal starts the meeting, greets the participants, and then presents the first agenda item. The avatar is responsible for facilitating the meeting and controlling the transition to the next agenda item.

[0765] Step 5:

[0766] When a user speaks, the device converts the speech into text data using speech recognition technology and sends it to the server. Additionally, an emotion engine analyzes the user's facial expressions and tone of voice to evaluate the user's emotional state in real time.

[0767] Step 6:

[0768] The server analyzes the received text and sentiment data and provides relevant information to the terminal as needed. This information is presented to the user in real time through a virtual avatar during the meeting.

[0769] Step 7:

[0770] The emotion engine allows the avatar to adjust the meeting flow and information presentation based on the user's emotions. For example, if the user is having difficulty understanding something, it will add more detailed explanations.

[0771] Step 8:

[0772] After the meeting, the server automatically extracts key points from the discussion using all collected data and generates meeting minutes. These minutes include annotations based on sentiment analysis and significant changes in sentiment.

[0773] Step 9:

[0774] The server supports effective follow-up by distributing the generated meeting minutes to participants via email or other communication methods. This allows each participant to easily understand the content of the meeting.

[0775] (Example 2)

[0776] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0777] In today's world, meetings involving multiple participants, including those in remote locations, demand effective and smooth communication. However, it is difficult to immediately understand and respond to the emotions and nuances of participants' statements. Furthermore, quickly extracting key points from a vast amount of discussion and creating accurate meeting minutes is a time-consuming and laborious challenge.

[0778] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0779] In this invention, the server includes means for analyzing information received from participants and automatically creating an order of information based on its importance; means for conducting a meeting using a virtual dialogue subject and engaging in natural dialogue with users; and means for analyzing users' facial expressions to recognize their emotional state in real time and reflect this in the progress of the meeting. This makes it possible to quickly reflect the diverse emotions and statements of participants and efficiently extract and record important matters.

[0780] "Information received from participants" refers to the content and data provided by those attending the meeting, which become the subject of the meeting's agenda and discussion.

[0781] "Means for automatically creating information order based on importance" refers to a method that analyzes received information, evaluates the priority of its content, and automatically determines the order in which it is processed or presented.

[0782] A "virtual dialogue entity" is a computer-generated avatar or agent that facilitates discussions in a meeting and interacts naturally with participants.

[0783] "Means of natural dialogue with users" refers to communication technologies that enable participants and virtual dialogue subjects to communicate smoothly and appropriately, just as they would between humans.

[0784] "Methods for analyzing a user's facial expressions and recognizing their emotional state in real time" refers to methods that use cameras and sensors to capture a user's face and movements, and then use that information to instantly determine their current emotions.

[0785] "Means of influencing the progress of the meeting" refers to a system that utilizes the results of real-time analysis of emotions and statements to appropriately adjust the flow and content of the meeting.

[0786] To implement this invention, an integrated system for meeting management is used. Its specific configuration and process are described below.

[0787] The server receives information sent from participants and analyzes it. For natural language processing, it uses libraries such as NLTK or spaCy. This allows it to extract keywords from the received information, evaluate their importance, and automatically create an order suitable for the meeting's progress.

[0788] Next, the terminal facilitates the meeting through a virtual dialogue entity. This virtual dialogue entity is created using a real-time 3D platform such as Unity or Unreal Engine. The terminal controls the virtual dialogue entity based on data provided by the server, enabling natural conversation with the user. It is possible to analyze the user's emotional state in real time using OpenCV and common facial recognition APIs through the user's voice and camera input.

[0789] During meetings, to allow users to easily participate, terminals utilize speech recognition services such as Google Cloud Speech-to-Text and Amazon Transcribe to convert spoken content into text in real time. The text data is immediately recorded on the server.

[0790] Furthermore, after the meeting concludes, the server analyzes the recorded text data and uses the Python Pandas library to extract important statements and decisions. The resulting report is then distributed to participants via email.

[0791] As a concrete example, a possible prompt for a generative AI model might be, "Please highlight and explain the key points of the next agenda item." This prompt allows the system to generate actions that effectively support the progress of the meeting.

[0792] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0793] Step 1:

[0794] The server receives information from participants. It accepts information submitted via email or web forms as input. This information is processed using a natural language processing library to extract keywords. Specifically, the information is tokenized, part-of-speech tagged, and frequency analysis is used to list important words and phrases. This analysis results in an output that ranks the information based on its importance.

[0795] Step 2:

[0796] The terminal activates a virtual interactive entity and prepares for the meeting based on the ordered information received from the server. The input is the order information provided by the server. The virtual entity sets up the visuals and actions using a real-time 3D platform such as Unity. This prepares it for presentation to the user. The output is the generated meeting scenario.

[0797] Step 3:

[0798] The terminal presents information to the user through a virtual dialogue entity during the meeting. During the meeting, the terminal receives and uses the user's voice and video input. Sentiment analysis is performed using OpenCV or general facial recognition APIs, recognizing the user's emotional state in real time from their facial expressions. The analysis results are reflected in the virtual entity's actions. Outputs include adjusting the ongoing agenda and providing additional explanations.

[0799] Step 4:

[0800] The server converts the audio from the meeting into text using a speech recognition system such as Google Cloud Speech-to-Text. The input is a real-time audio signal recorded during the meeting. The speech recognition service outputs the audio as text data, which is recorded immediately. This ensures that the meeting content is saved in text format.

[0801] Step 5:

[0802] After the meeting ends, the server processes the accumulated text data and extracts important statements and decisions. The input is the text data saved in step 4. The data is structured and analyzed using the Python Pandas library to create a list of key items. The output is a structured report, which is distributed via email.

[0803] This series of processes enables users to conduct meetings efficiently and share information accurately.

[0804] (Application Example 2)

[0805] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0806] In recent years, smooth communication has become difficult in family conversations and family meetings due to differences in emotions and opinions among participants. In such situations, discussions often stall, particularly due to emotional misunderstandings, and the efficiency of the conversation decreases. To solve this problem, there is a need for a system that can grasp the emotions of participants in real time and respond appropriately.

[0807] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0808] In this invention, the server includes means for analyzing agenda items received from participants and automatically creating an order of agenda items based on their importance; means for conducting the meeting using virtual avatars and engaging in natural dialogue with participants; and means for analyzing participants' facial expressions and voice tone in real time and recognizing their emotions. This makes it possible to adjust the discussion according to the participants' emotions, resulting in smoother and more satisfying communication.

[0809] A "server" is a device that receives and processes data via a network and has the function of providing information to multiple users.

[0810] "Participants" refer to individual members who take part in a meeting or discussion and offer comments or opinions on the agenda.

[0811] An "agenda" refers to a specific topic or issue that should be addressed in a meeting or discussion, and it is prioritized for discussion based on its importance.

[0812] A "virtual avatar" is a human-like character created using digital technology, which interacts with the user and serves the role of presenting information.

[0813] "Speech recognition" refers to the technology that analyzes speech data to identify syllables and words and converts them into text data.

[0814] "Text data" refers to information in which speech or written text is encoded as characters, and is in a digital format that can be stored and searched.

[0815] "Emotion recognition" is a technology that identifies a person's emotional state from facial expressions, tone of voice, and other factors, and is used to adjust responses and feedback.

[0816] Meeting minutes are a record that includes summaries of statements and decisions made during a meeting, and are provided to participants in a format that can be referenced at a later date.

[0817] "Communication means" refers to technical devices and methods for sending and receiving data, and aims to improve the efficiency of information transmission.

[0818] This invention aims to build a meeting management system that incorporates emotion recognition to facilitate smooth communication within the home. Using a home robot as hardware, and leveraging emotion engines and voice recognition technology, it supports conversations and family meetings within the home.

[0819] The server analyzes the agenda items received from participants based on data transmitted from home robots and automatically sets their order based on importance. This process uses a natural language processing library to extract keywords from the agenda items and determine their priority.

[0820] The terminal, a home robot, is equipped with a virtual avatar that facilitates natural conversation. The robot recognizes participants' voices in real time and converts them into text data using the Google Cloud Speech-to-Text API. This text data is sent to a server where key points are further extracted.

[0821] Furthermore, the system uses an emotion recognition API like Affectiva to identify the emotional state that users exhibit during family meetings, and the home robot then provides feedback and additional information based on that data. For example, if a toddler shows signs of boredom while the family is discussing holiday plans, the robot can ask, "Shall I suggest a new activity?"

[0822] For example, when a family is discussing whether or not to welcome a new pet into their home, the robot could suggest, "Shall I look up some information about pet care?" An example of a prompt to the generating AI model in this case would be, "If the family is hesitant about getting a pet, how would you present information that could help resolve the issue?"

[0823] This system is expected to reduce misunderstandings in conversations and enable communication that satisfies all participants.

[0824] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0825] Step 1:

[0826] The server receives agenda data from a home robot. The received data is analyzed using a natural language processing library to extract keywords. This determines the importance of the agenda items and generates an ordered list. This list is then sent to the next step.

[0827] Step 2:

[0828] A home robot, acting as a terminal, prepares to conduct a meeting using a virtual avatar based on an agenda list received from a server. The avatar presents the agenda to participants and initiates a natural conversation. It receives voice input from the user, and that voice data is sent to the next step.

[0829] Step 3:

[0830] When a user speaks, the device sends the audio data in real time to the Google Cloud Speech-to-Text API, where it is converted into text data. This text data is stored on the server. The converted text is then used in the next feedback step.

[0831] Step 4:

[0832] The device analyzes the user's facial expressions and voice tone through emotion recognition APIs such as Affectiva. This analysis determines the user's emotional state. The results are immediately transmitted to the virtual avatar, which then provides conversational feedback and information tailored to the user.

[0833] Step 5:

[0834] Once the meeting concludes, the server analyzes the saved text data and extracts important statements and decisions. The extracted content is automatically generated as meeting minutes. These minutes are then distributed in the next step.

[0835] Step 6:

[0836] The terminal distributes the generated meeting minutes to participants via email or other communication methods. This allows users to review the meeting content and prepare for future meetings.

[0837] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0838] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0839] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0840] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0841] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0842] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0843] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0844] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0845] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0846] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0847] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0848] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0849] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0850] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0851] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0852] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using this memory.

[0853] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0854] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0855] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0856] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0857] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0858] The following is further disclosed regarding the embodiments described above.

[0859] (Claim 1)

[0860] A means to analyze the agenda items received from participants and automatically create an order of the agenda items based on their importance,

[0861] A means of conducting meetings using virtual avatars and engaging in natural conversations with participants,

[0862] A method for performing real-time speech recognition on speeches during meetings, converting them into text data, and saving them,

[0863] A method for extracting key points from text data after a meeting, automatically generating meeting minutes, and providing them to participants,

[0864] A system that includes this.

[0865] (Claim 2)

[0866] The system according to claim 1, wherein the virtual avatar provides relevant information based on participants' statements during a meeting and supports the discussion.

[0867] (Claim 3)

[0868] The system according to claim 1, wherein the minutes include communication means for electronically distributing the generated minutes.

[0869] "Example 1"

[0870] (Claim 1)

[0871] A device that analyzes agenda items received from participants and automatically creates an order of agenda items based on their importance,

[0872] A device that uses virtual characters to facilitate meetings and enable natural dialogue with participants,

[0873] A device that performs real-time speech recognition of speeches during a meeting, converts them into text information, and saves it,

[0874] A device that extracts important information from text data after a meeting, automatically generates a report, and provides it to participants.

[0875] A communication device for electronically distributing reports,

[0876] A system that includes this.

[0877] (Claim 2)

[0878] The system according to claim 1, wherein the virtual person provides relevant information based on the participants' statements during a meeting and supports the discussion.

[0879] (Claim 3)

[0880] The system according to claim 1, which searches for relevant information based on saved text information and generates informational materials to help participants understand.

[0881] "Application Example 1"

[0882] (Claim 1)

[0883] A means to analyze the tasks received from participants and automatically create a sequence of tasks based on their importance,

[0884] A means of conducting discussions using virtual characters and engaging in natural conversations with participants,

[0885] A method for real-time speech recognition of utterances during a discussion, converting them into text data, and saving them,

[0886] A means for local residents to use smartphones to share issues, prioritize important decision-making matters, and support community meetings.

[0887] A method for extracting key points from text data after the discussion has concluded, automatically generating meeting minutes, and providing them to participants,

[0888] A system that includes this.

[0889] (Claim 2)

[0890] The system according to claim 1, wherein the virtual character provides relevant information based on the participants' statements during the discussion, thereby supporting the discussion.

[0891] (Claim 3)

[0892] The system according to claim 1, wherein the minutes include communication means for electronically distributing the generated minutes.

[0893] "Example 2 of combining an emotion engine"

[0894] (Claim 1)

[0895] A means of analyzing information received from participants and automatically creating an order of information based on its importance,

[0896] A means of conducting a meeting using a virtual dialogue subject and engaging in natural dialogue with users,

[0897] A method for real-time speech recognition of speeches during a meeting, converting them into text information and recording them,

[0898] A method for automatically generating a report based on textual information after the meeting has concluded and providing it to users,

[0899] A means of analyzing users' facial expressions to recognize their emotional state in real time and reflecting that in the progress of the meeting,

[0900] A system that includes this.

[0901] (Claim 2)

[0902] The system according to claim 1, wherein the virtual dialogue entity adjusts its response based on the user's statements and facial expressions during the meeting, thereby supporting the meeting.

[0903] (Claim 3)

[0904] The system according to claim 1, wherein the report includes communication means for electronically transmitting the generated report.

[0905] "Application example 2 when combining with an emotional engine"

[0906] (Claim 1)

[0907] A means to analyze the agenda items received from participants and automatically create an order of the agenda items based on their importance,

[0908] A means of conducting meetings using virtual avatars and engaging in natural conversations with participants,

[0909] A means of analyzing participants' facial expressions and voice tone in real time to recognize their emotions,

[0910] A method for real-time speech recognition, conversion to text data, and saving it,

[0911] A method for extracting key points from text data after a meeting, automatically generating meeting minutes, and providing them to participants,

[0912] A system that includes this.

[0913] (Claim 2)

[0914] The system according to claim 1, wherein a virtual avatar provides relevant information based on the participants' statements and emotions during a meeting, supports the discussion, and provides feedback in accordance with the participants' emotions.

[0915] (Claim 3)

[0916] The system according to claim 1, wherein the minutes include communication means for electronically distributing the generated minutes. [Explanation of symbols]

[0917] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means to analyze the agenda items received from participants and automatically create an order of the agenda items based on their importance, A means of conducting meetings using virtual avatars and engaging in natural conversations with participants, A method for real-time speech recognition of speeches during a meeting, converting them into text data, and saving them, A method for extracting key points from text data after a meeting, automatically generating meeting minutes, and providing them to participants, A system that includes this.

2. The system according to claim 1, wherein the virtual avatar provides relevant information based on participants' statements during a meeting and supports the discussion.

3. The system according to claim 1, wherein the minutes include communication means for electronically distributing the generated minutes.