system
The system addresses language learning barriers by offering practical activities, real-time translation, and feedback mechanisms to enhance communication and learning effectiveness.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
In language learning environments, especially for parents and children, there is a lack of practical opportunities, leading to limited learning effectiveness and declining motivation, with challenges in maintaining high-quality communication due to language barriers.
A system that provides practical language learning opportunities through selectable exchange activities, real-time communication management, translation services, and feedback mechanisms to improve activity quality.
Facilitates effective language learning by overcoming language barriers and enhancing communication quality, ensuring timely participation, and continuously improving the learning experience based on participant feedback.
Smart Images

Figure 2026101183000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In an environment for learning a foreign language, especially in the case of parents and children, there is a lack of practical opportunities to use the language, so the learning effect is often limited. As a result, it takes a long time to acquire a foreign language, and the motivation of learners may decline. In addition, it is difficult to provide an environment in which participants can easily understand each other while maintaining high-quality communication. Against such a background, there is a demand for a system that provides a practical place where participants can effectively learn a foreign language through various themes and activities.
Means for Solving the Problems
[0005] This invention provides an information storage means for storing multiple exchange activity information that participants can select, and a reservation management means for preparing events that interest participants. Furthermore, it includes a notification means for notifying participants before the start of an exchange activity to promote timely participation. A communication management means relays audio and video data between participants in real time, enabling more natural communication. It also provides a translation presentation means that translates participants' speech and displays the translation results, lowering language barriers. Finally, after the exchange activity ends, evaluations from participants are collected, and based on this, a feedback analysis means is used to improve the activity, thereby aiming for continuous improvement of the system's quality.
[0006] A "participant" is an individual who participates in an event as part of an exchange activity with the aim of learning a language.
[0007] "Exchange activity information" refers to detailed information about various themes and events that participants can choose from.
[0008] An "information storage means" is a means that can store information about exchange activities and provide it to participants as needed.
[0009] A "reservation management system" is a means of receiving and managing reservations for the interaction activities selected by participants.
[0010] A "notification method" is a means of informing participants of the start of a scheduled interaction activity before the activity begins.
[0011] "Communication management means" refers to means for relaying audio and video data between participants.
[0012] A "translation presentation method" is a means of translating the content of a participant's speech and providing the result to the participant.
[0013] "Methods for collecting evaluations" refer to methods for collecting evaluations from participants after the exchange activity has concluded.
[0014] A "feedback analysis method" is a means of analyzing the evaluations collected from participants and using that information to improve interaction activities. [Brief explanation of the drawing]
[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14]It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when a sentiment engine is combined.
Embodiments for Carrying Out the Invention
[0016] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0017] First, the terms used in the following description will be explained.
[0018] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include CPU (Central Processing Unit), GPU (Graphics Processing Unit), GPGPU (General-Purpose computing on Graphics Processing Units), APU (Accelerated Processing Unit), and the like.
[0019] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0020] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disk (e.g., hard disk), or magnetic tape, and the like.
[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0023] [First Embodiment]
[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0036] The system for implementing this invention provides participants with selectable interaction activities and includes various functions for the smooth operation of those activities. Its specific form is described below.
[0037] First, the server maintains multiple pieces of interaction activity information stored in a database and distributes this activity information in response to requests from terminals. This allows users to view various activities on their terminals and select those that interest them.
[0038] Users make reservations for the social activities they wish to participate in using their devices. The devices send this information to the server, which records it in a database using a reservation management system. Based on this record, the server sends a reminder to the user using a notification system before the activity begins. This ensures that users do not forget to participate in the activity and can take part in events in a timely manner.
[0039] During the event, the devices will provide video and text chat interfaces to facilitate smooth communication between users. A communication management system will relay audio and video data in real time via a server, enabling seamless interaction even among participants in remote locations. When necessary, the server will utilize a generative AI model to translate user speech in real time and display the results on the device through a translation display system. This ensures smooth and misunderstanding-free communication even among users from different language backgrounds.
[0040] Furthermore, after the exchange activity concludes, the server collects evaluations from each participant through an evaluation collection mechanism and stores them in a database. This evaluation data is analyzed by a feedback analysis mechanism and used to improve the quality of future activities. This system provides users with an environment where they can experience activities tailored to their individual needs and effectively practice their foreign language skills.
[0041] As a concrete example, suppose a foreign user learning Japanese participates in a "language exchange event." The server connects this user with a Japanese user via video chat and provides real-time translation between Japanese and the foreign language. After the event, the server uses feedback from both parties to design the next language exchange event more effectively. This is expected to improve participant satisfaction and enhance the quality of learning.
[0042] The following describes the processing flow.
[0043] Step 1:
[0044] The server retrieves information about social activities from the database and provides it to the terminal. Users view this information on their terminal and select activities that interest them.
[0045] Step 2:
[0046] When a user selects an activity they wish to participate in, the device sends a reservation request to the server. The server receives this request and records the reservation information in the database using its reservation management system.
[0047] Step 3:
[0048] As the start time for the interaction activity approaches, the server sends a reminder to the user's device using a notification system. This allows the user to confirm the start of the event in advance.
[0049] Step 4:
[0050] The user checks the notification sent on their device and clicks the link to join the event. The device sends a participation request to the server, and the server uses communication management tools to connect the user to the designated video chat session.
[0051] Step 5:
[0052] The server relays audio and video data between participants in real time. The terminal receives this data and displays it to the user through a video chat interface. If necessary, the server translates the conversation using a generative AI model and sends the result to the terminal for display on the screen.
[0053] Step 6:
[0054] Once the interaction activity concludes, the server displays a feedback form on the terminal. The user enters their evaluation and comments on the activity into this form. The terminal then sends the entered information to the server.
[0055] Step 7:
[0056] The server uses evaluation collection tools to record feedback in a database. Feedback analysis tools are used to analyze the information, and the data will be considered for use in providing better interaction activities.
[0057] (Example 1)
[0058] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0059] In today's increasingly globalized society, overcoming language barriers and facilitating smooth communication among participants are significant challenges. Furthermore, efficiently collecting participant feedback and evaluations to improve the quality of each activity is also essential.
[0060] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0061] In this invention, the server includes an information recording means for storing activity information, a reservation processing means for accepting activity reservations, and a notification means for sending notifications before the start of an activity. This allows participants to smoothly participate in activities regardless of their language, and also enables effective collection and analysis of feedback from participants.
[0062] "Information recording means" refers to a device or configuration for storing and managing multiple activity information items that participants can select.
[0063] "Reservation processing means" refers to a function or process for accepting and recording activity reservations based on participants' choices.
[0064] "Notification means" refers to a method or system for informing participants of information before the start of a scheduled activity.
[0065] "Data management means" refers to interfaces and protocols used to relay and manage communication data between participants.
[0066] "Language conversion means" refers to a system or module for translating participants' utterances and presenting the translation results.
[0067] "Feedback collection methods" refer to techniques or devices for efficiently collecting evaluations and opinions from participants after an activity has concluded.
[0068] A "generative model" refers to an AI-based model or algorithm used to translate participants' speech in real time.
[0069] "Analytical tools" refer to data analysis techniques and methods used to improve the quality of activities based on evaluations from participants.
[0070] This system offers a variety of interactive activities for participants to choose from and includes various functions to ensure their smooth operation. The server has a database for storing activity information and efficiently manages and updates a large amount of activity data through database management software. For example, a database system such as MySQL® can be used.
[0071] The server delivers activity information to the user's device via a web server or API in response to the user's request. This allows the user to view and select a variety of activities on their device using a browser or native application (e.g., React Native or Swift).
[0072] When a user wishes to participate in an activity, they make a reservation through their device. The device sends the reservation information to the server, which then records the reservation information in a database. A reservation management system (for example, a system using a RESTful API or RPC) can be used for this process.
[0073] Before an activity begins, the server sends reminder emails or push notifications to users. These notifications utilize email servers (such as Postfix or SendGrid) or push notification services (such as Firebase Cloud Messaging).
[0074] During the event, the devices will provide video and text chat interfaces, enabling smooth communication between users. The server will use protocols such as RTC for communication management to relay audio and video in real time. In particular, for participants who speak different languages, generative AI models (e.g., Google® Translate API or Microsoft® Azure® Translator) will be used to translate their speech in real time, and the translation results will be displayed on the device.
[0075] As a concrete example, consider a foreign user learning Japanese who participates in a "language exchange event." The server connects this user with a Japanese user via video chat and provides real-time translation between Japanese and the foreign language. An example of a prompt might be, "Please explain the process for providing real-time translation between Japanese and English." After the event, feedback is collected from both parties and used to design the next language exchange event more effectively. This allows participants to have a smooth experience of cross-language communication.
[0076] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0077] Step 1:
[0078] The server retrieves activity information from the database using information recording means and distributes it based on requests from the terminal. The input is the user's request data, and the output is the activity information sent to the terminal. Specifically, the server queries the MySQL database to retrieve the necessary information and transmits it to the terminal via a RESTful API.
[0079] Step 2:
[0080] The user views the activity information received on their device and selects the activity they wish to participate in. The input is the displayed activity information, and the output is the activity data selected by the user. This selection data is processed via the user interface using client-side scripting such as JavaScript®.
[0081] Step 3:
[0082] The terminal uses a reservation processing mechanism to send reservation information for the selected activity to the server. The input is the user's selected activity data, and the output is the reservation information recorded on the server. Specifically, the terminal sends the user selection information to the server in JSON format, and the server records it directly in the database.
[0083] Step 4:
[0084] The server uses notification methods based on reservation information to send reminders to users before the activity begins. The input is reservation information, and the output is a reminder email or push notification sent to the user. Specifically, the server sends emails using the SMTP protocol or push notifications via the Firebase Cloud Messaging API.
[0085] Step 5:
[0086] The terminal uses communication management means to provide video chat and text chat interfaces, enabling communication between users. Input is user voice and text data, and output is real-time video and audio data provided to other users. Using the RTC protocol, the server relays the audio and video data.
[0087] Step 6:
[0088] The server applies a generative AI model as a language conversion method, translating participants' speech in real time and displaying it on the terminal. The input is the user's speech data, and the output is translated text data. This includes the specific actions the server takes when performing translation via the Google Translate API.
[0089] Step 7:
[0090] The server collects feedback from participants after the activity using a feedback collection system, and uses this information to improve the quality of future activities. The input is participant evaluation data, and the output is analyzed feedback data. Specifically, evaluations are collected through a survey form and analyzed using an analysis program.
[0091] (Application Example 1)
[0092] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0093] In online exchange activities, language barriers exist when participants use different languages, making smooth real-time communication difficult. Furthermore, there is a need for effective methods to collect and analyze feedback to improve the quality of the activities.
[0094] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0095] In this invention, the server includes an information storage means for storing multiple exchange activity information that participants can select, a communication management means for relaying audio and video data between participants, and a translation presentation means for translating participants' speech and presenting the translation results in real time. This enables smooth, real-time communication even among participants who speak different languages. Furthermore, evaluations from participants can be collected after the exchange activity ends, and the quality of the activity can be improved based on the feedback.
[0096] "Information storage means" refers to a memory device that holds information on interaction activities that participants can select and provides as needed.
[0097] A "reservation management system" is a means that has the function of accepting and managing reservations for exchange activities based on the participants' choices.
[0098] "Notification method" refers to a means of sending reminders or notifications to participants in advance before the start of a scheduled social activity.
[0099] A "communication management system" is a means that has management functions to relay audio and video data between participants in real time and to enable smooth communication.
[0100] A "translation presentation method" is a means of translating participants' utterances and presenting the translation results quickly and accurately.
[0101] "Evaluation collection methods" refer to methods for gathering feedback from participants after an exchange activity has concluded, in order to collect information that can be used to improve future activities.
[0102] "Improvement measures" refer to methods for taking steps to improve the quality of interaction activities based on feedback received from participants.
[0103] A "generative model" is a machine learning model used to translate participants' speech in real time.
[0104] To implement this invention, a system is constructed in which a server and a user terminal work together to perform their functions. The server first stores information on multiple interaction activities in a database using an information storage means. This information is used when participants access it on their terminals and select activities of interest. The selected activities are recorded through a reservation management means based on requests sent from the terminals, and the server manages the reservation information.
[0105] Before a scheduled activity begins, the server sends a reminder to participants using a notification system. This ensures that participants do not miss the timing of the activity.
[0106] During the activity, the user's device collects audio and video data and relays it in real time with other participants via a communication management system. For video chat, WebRTC technology is used to ensure reliable audio and video delivery. For translation presentation, the Google Translate API is utilized, and a machine learning model (generative AI model) is used to translate the user's speech in real time. The resulting translation is immediately displayed on the participant's device, enabling smooth communication even among participants using different languages.
[0107] After the activity concludes, the server uses evaluation tools to collect feedback from each participant. Then, using improvement tools, this feedback is analyzed and incorporated into future activities to enhance their quality.
[0108] As a concrete example, let's say a foreign user learning Japanese participates in an online workshop about Japanese culture. In this case, the server connects him to a Japanese-speaking instructor via video chat and provides real-time translation between Japanese and his native language. After the event ends, the user can submit feedback on topics they would like to see covered in the next workshop. For example, they could submit a prompt such as, "Please tell me what topics you would like to see covered in the next Japanese culture workshop. Examples: Tea ceremony, how to make bonsai."
[0109] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0110] Step 1:
[0111] The user accesses the server using a terminal and retrieves interaction activity information stored in the information storage device. The server receives the user's request as input, filters the corresponding activity information from the database, and presents it to the terminal as output. This allows the user to select activities of interest.
[0112] Step 2:
[0113] The user sends a reservation request from their device for the selected activity. The server, receiving the selection information as input, uses a reservation management system to combine the activity information and user information and record the reservation in the database. The user is then notified that the reservation is complete.
[0114] Step 3:
[0115] The server uses a notification system before the scheduled activity begins, sending a reminder to the user's device using the reservation information as input. The user receives a notification containing activity details and time as output, ensuring they don't forget to participate.
[0116] Step 4:
[0117] During the activity, the user's device collects participants' audio and video in real time. It receives data from the device's microphone and camera as input, and relays it to other participants via a server using a communication management system. As output, the audio and video data are sent to other participants on the video chat platform.
[0118] Step 5:
[0119] The translation presentation system allows the server to translate user speech in real time. It converts user voice data into text as input and sends it to a generation AI model. A translation API is used for data calculation, and the output—text translated into different languages—is presented to the user's terminal. This enables smooth communication among participants, even in different languages.
[0120] Step 6:
[0121] After the interaction activity ends, users send feedback to the server via their devices. The server receives evaluation data as input and records it in a database using an evaluation collection method. As output, feedback data is collected to help improve future activities.
[0122] Step 7:
[0123] The server analyzes the collected feedback using improvement methods. Using evaluation data as input, it extracts areas for improvement to enhance the quality of the activity through data analysis techniques. The output is a plan for improvement in the next activity. This leads to improved activity quality and increased participant satisfaction.
[0124] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0125] Embodiments of this invention consist of a system having various functions to enhance user-participatory social activities. First, the server provides social activity information stored in a database to the user's terminal, and the user can view this information via the terminal and select activities of interest. The reservation management means allows the server to accept reservations for the selected activities and record them in the database.
[0126] Furthermore, by incorporating an emotion engine, it becomes possible to recognize the user's emotions from their speech and captured facial expressions. The server analyzes the output of the emotion engine and has control mechanisms to adjust the progress of the interaction activity as needed. For example, if a participant is detected as being nervous, the system can adjust the pace of the activity or send encouraging messages.
[0127] When users participate in interaction activities, the server uses communication management means to relay audio and video data in real time, and further provides real-time translations as needed using translation presentation means, thereby facilitating smooth communication even among users with different language backgrounds. A generative model is used for translation, providing fast and accurate translation results.
[0128] After the interaction activity concludes, the server displays a feedback form on the terminal to collect user feedback. Furthermore, emotional data collected by the emotion engine is also evaluated, and this data is analyzed using data analysis tools. The resulting data is used to improve the quality of future interaction activities, thereby enhancing the user's learning effectiveness.
[0129] For example, if a Japanese user learning English and an English-speaking user learning Japanese participate in a language exchange activity, the server connects the users via video chat, and the emotion engine recognizes emotions from the participants' facial expressions. If the server determines that the users are enjoying themselves, it maintains the activity to ensure that situation continues. On the other hand, if the server determines that either user is confused, it adaptively controls the interaction, such as by enhancing the translation presentation to support conversation comprehension. This makes it possible to maintain motivation for language learning and maximize practical learning effectiveness.
[0130] The following describes the processing flow.
[0131] Step 1:
[0132] The server retrieves interaction activity information from the database and sends it to the user's terminal. The user then browses the activity information provided on their terminal and selects activities that interest them.
[0133] Step 2:
[0134] When a user selects an activity they wish to participate in, the device sends a reservation request to the server. The server uses a reservation management system to record this information in a database and sends a reservation confirmation notification to the device.
[0135] Step 3:
[0136] Before the interaction activity begins, the server uses a notification system to send a reminder to the user's device informing them of the start time. The user receives the notification and prepares for the activity.
[0137] Step 4:
[0138] After the user confirms the notification and indicates their intention to participate in the activity via their device, the device sends a participation request to the server. The server connects the user to the video chat session via a communication management system.
[0139] Step 5:
[0140] During a video chat, the server uses a generative model to translate speech in real time and displays the results on the terminal via a translation display device. The server also operates an emotion engine to analyze the user's emotions from their speech and facial expressions. Based on this emotion data, the server adjusts the flow of the interaction as needed.
[0141] Step 6:
[0142] Once the interaction activity is complete, the server instructs the device to display a feedback form. The user enters their evaluation and comments on the activity into the form, and the device sends it to the server.
[0143] Step 7:
[0144] The server uses an evaluation collection mechanism to store user feedback in a database, and then uses a feedback analysis mechanism to analyze all the data, including the results of the emotion engine's analysis. Based on the analysis results, it plans future activities and system improvements.
[0145] (Example 2)
[0146] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0147] Traditional interaction activity systems often suffered from difficulties in smooth communication among participants due to differences in language and emotions, resulting in reduced effectiveness. Furthermore, the lack of mechanisms to dynamically adjust activity content made it challenging to achieve sufficient participant satisfaction. Additionally, post-activity evaluations were often managed merely as numerical data, failing to effectively utilize feedback for future improvements.
[0148] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0149] In this invention, the server includes data storage means for storing information on multiple interaction activities that participants can select, reservation acceptance means for accepting reservations for interaction activities based on the participant's selection, and emotion analysis and adjustment means for analyzing the emotions of participants during interaction activities and dynamically adjusting the content of the activities. This makes it possible to provide flexible activities that respond to language differences and changes in emotions.
[0150] A "data storage means" is a medium or device that stores information on multiple interaction activities that participants can select, and allows them to access it as needed.
[0151] A "reservation acceptance method" refers to a medium or function for accepting reservations for exchange activities based on participants' choices.
[0152] "Information transmission means" refers to functions or devices used to send notifications to participants before the start of a scheduled exchange activity.
[0153] "Data transmission means" refers to technologies and devices for relaying audio and video data between participants.
[0154] "Translation provision means" refers to a function or mechanism for translating participants' utterances and presenting the translation results.
[0155] "Emotional analysis and adjustment means" refers to methods or devices for analyzing the emotions of participants during interaction activities and dynamically adjusting the content of those activities based on the results.
[0156] "Data collection and analysis means" refers to a system or process for collecting evaluations from participants after the completion of an interaction activity and analyzing them together with the participants' emotional data.
[0157] A "generative model" is a machine learning model or algorithm used to translate participants' speech in real time.
[0158] As an embodiment of this invention, a system for more effectively facilitating interaction activities is provided. The server primarily utilizes data storage means, reservation acceptance means, sentiment analysis and adjustment means, data transmission means, translation provision means, and data collection and analysis means to facilitate smooth communication among participants. Specifically, this system operates as follows.
[0159] The server first uses data storage means to store information related to the interaction activities in a database. The hardware used here includes cloud storage systems and relational database systems. This information is stored in a way that makes it accessible to participants.
[0160] When a participant selects an activity, the device sends this information to the server using a reservation acceptance mechanism. The device provides a user interface via a web browser or mobile application. The reserved information is recorded in the server's database.
[0161] Once the interaction begins, the server uses data transmission methods to relay participants' audio and video data in real time. The software used for this can leverage the APIs of common video conferencing tools.
[0162] During the activity, the server utilizes emotion analysis and adjustment tools, using a generative AI model to analyze participants' speech and facial expression data. The software used includes machine learning algorithms that read emotions. This allows the system to determine whether participants are relaxed or tense, and adjust the pace and content of the activity accordingly.
[0163] The server also uses translation tools to translate participants' different languages in real time. The translation service, provided by a generative AI model, is highly accurate and fast.
[0164] Once the interaction activity concludes, the device collects and analyzes feedback from participants via data collection and analysis tools and sends it to the server. This feedback, along with sentiment analysis data, is analyzed in a way that helps improve future interaction activities.
[0165] As a concrete example, a Japanese user might participate with the goal of learning English, while an English-speaking user might participate in the exchange activities to learn Japanese. The server connects each user via video chat, and the emotion engine analyzes the facial expressions of both. In this case, an example of a prompt sentence to be input into the generating AI model would be, "Please suggest a way to adjust the pace of the activity based on the emotion data obtained from User A's facial expressions."
[0166] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0167] Step 1:
[0168] The server retrieves interaction activity information from the database using data storage means and sends it to the terminal. This information includes the type of activity, date and time, and location. When a user logs in and accesses the system, the terminal provides a user interface that displays this information. The user selects activities of interest as user input, and the terminal sends this selection to the server. The output is the user's selection information, which is stored on the server.
[0169] Step 2:
[0170] The terminal sends a reservation request for an activity to the server based on the user's selection. The server records the reservation information in a database using a reservation acceptance mechanism. The input is the reservation request, which includes the user ID and the selected activity information. The server output is the reservation status recorded in the database. This allows the user to confirm whether their reservation was successful.
[0171] Step 3:
[0172] The server utilizes a video conferencing system to relay audio and video data between participants in real time using data transmission means. When a user joins an activity, the terminal sends and receives this media data, providing the user with live video and audio. The input is audio and video data from the user, and the output is audio and video data transmitted to other participants.
[0173] Step 4:
[0174] The server utilizes emotion analysis and adjustment mechanisms, using a generative AI model to analyze participants' statements and captured facial expressions. This allows for real-time understanding of the user's emotional state. The input is participant's statements and facial expression data, and the output is analyzed emotional information. Specifically, if the server determines that a participant is tense, it may slow down the pace of the activity or send reassuring messages.
[0175] Step 5:
[0176] The server utilizes a translation service and uses a generative AI model to translate participants' utterances in real time. The input is participant utterance data, which may include different languages. The output is the translated text. The terminal displays this text in a user interface to help participants understand the meaning.
[0177] Step 6:
[0178] After the interaction activity concludes, the terminal uses data collection and analysis tools to obtain feedback from participants and transmit it to the server. The input consists of user evaluations and impressions. The server collects this data and analyzes it, along with sentiment data, to identify areas for improvement for future activities. The output is improvement suggestions based on the analysis, thereby enhancing the system's service.
[0179] (Application Example 2)
[0180] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0181] In online or offline interaction activities, participants may face difficulties in smooth communication due to differing language backgrounds and emotional states. Therefore, there is a need for systems that support adaptive communication tailored to participants' language and emotional states. Furthermore, improving the quality of future activities based on participant feedback and emotional data is a key challenge.
[0182] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0183] In this invention, the server includes information storage means for storing multiple interaction activity information that participants can select, emotion recognition means for analyzing the participant's facial expression information to recognize emotions, and activity adjustment means for adaptively controlling activities according to the participant's emotions. This enables smooth and effective interaction activities by providing appropriate support based on the participant's language and emotional state.
[0184] "Information storage means" refers to a device or function that stores and manages information on multiple interaction activities that participants can select.
[0185] A "reservation management system" is a function that accepts and manages reservations for social activities based on the participants' choices.
[0186] "Notification means" refers to a system or function for sending notifications to participants before the start of a scheduled social activity.
[0187] "Communication management means" refers to a function or device for relaying audio and video data between participants.
[0188] "Translation presentation means" refers to a function or device for translating a participant's utterance and presenting the translation result.
[0189] "Evaluation collection methods" refer to functions or systems for collecting evaluations from participants after an exchange activity has concluded.
[0190] "Emotion recognition means" refers to technologies and functions that analyze participants' facial expressions to recognize their emotions.
[0191] "Activity adjustment means" refers to a function or system for adaptively controlling activities in accordance with the emotions of the participants.
[0192] This invention is a system for improving the user experience when participating in social activities, and has multiple functions. The server uses information storage means to store and manage information on social activities that participants can select. When a user selects an activity through a terminal, the reservation management means accepts the reservation based on that selection. Subsequently, notification means appropriately sends a notification to the user before the start of the social activity.
[0193] To ensure smooth communication, the server uses communication management means to relay audio and video data between users in real time. Furthermore, a translation presentation means is used to translate participants' speech in real time using a generation AI model, and the translation results are presented to the users. This enables effective communication even among participants with different language backgrounds.
[0194] To improve user satisfaction, the server analyzes the user's facial expressions using emotion recognition tools and recognizes their emotions. Based on this data, activity adjustment tools adaptively control interaction activities according to the user's emotions, adjusting the pace as needed and providing encouraging messages to ensure a comfortable experience for participants.
[0195] After the activity concludes, the server uses evaluation tools to collect user feedback, and also evaluates emotional data obtained through emotion recognition tools. This data is then used to improve the quality of future interaction activities.
[0196] As a concrete example, in the case of a smart communication robot, this robot is equipped with a small camera to detect the user's smile or confused expression and analyzes their emotions using the Google Cloud Vision API. Based on the analysis results, the robot utilizes the DeepL API to quickly translate the user's speech and supports the real-time transmission of the translated statement to the other party. An example of a prompt in a generative AI model would be a specific instruction such as, "Analyze the user's emotions based on their facial expressions during the conversation. If the expression indicates tension or stress, provide a message of encouragement."
[0197] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0198] Step 1:
[0199] The server stores interaction activity information in a database using information storage means. When a user accesses the activity information via a terminal, the server provides information in response to the request, and the user selects activities of interest. The input at this time is the user's selection information, and the output is information about the activities corresponding to that selection.
[0200] Step 2:
[0201] The server accepts reservations for selected activities using a reservation management system. The input is the user's selection information, which is recorded in the database to complete the reservation. The output is a reservation completion notification sent to the user.
[0202] Step 3:
[0203] The server uses a notification mechanism to inform the user's terminal of the start time of the scheduled activity before the activity begins. The input to this process is the activity start time information, and the output is a notification message to the user.
[0204] Step 4:
[0205] The server uses communication management means to relay audio and video data between users in real time. The input to this process is the user's audio and video data, and the output is to provide this data to other participants as relayed data in real time.
[0206] Step 5:
[0207] The server uses a translation presentation mechanism to translate participants' speech in real time using a generative AI model. The input is user voice data, which is output as translated text through data conversion. The generative AI model provides fast and accurate translation.
[0208] Step 6:
[0209] The server uses emotion recognition technology to analyze the user's facial expressions and recognize their emotions. The input for this process is image data acquired from the camera, which is output as emotion data using the Google Cloud Vision API.
[0210] Step 7:
[0211] The server uses activity adjustment mechanisms to adjust activities based on recognized emotions. Emotional data is the input, and based on this, the server adjusts the pace of activities and provides encouraging messages to the user as output.
[0212] Step 8:
[0213] After the interaction activity concludes, the server uses an evaluation collection mechanism to gather feedback from users. In this process, the input is user evaluation information, which is then aggregated and output as data to help improve the quality of future activities.
[0214] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0215] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0216] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0217] [Second Embodiment]
[0218] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0219] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0220] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0221] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0222] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0223] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0224] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0225] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0226] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0227] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0228] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0229] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0230] The system for implementing this invention provides participants with selectable interaction activities and includes various functions for the smooth operation of those activities. Its specific form is described below.
[0231] First, the server maintains multiple pieces of interaction activity information stored in a database and distributes this activity information in response to requests from terminals. This allows users to view various activities on their terminals and select those that interest them.
[0232] Users make reservations for the social activities they wish to participate in using their devices. The devices send this information to the server, which records it in a database using a reservation management system. Based on this record, the server sends a reminder to the user using a notification system before the activity begins. This ensures that users do not forget to participate in the activity and can take part in events in a timely manner.
[0233] During the event, the devices will provide video and text chat interfaces to facilitate smooth communication between users. A communication management system will relay audio and video data in real time via a server, enabling seamless interaction even among participants in remote locations. When necessary, the server will utilize a generative AI model to translate user speech in real time and display the results on the device through a translation display system. This ensures smooth and misunderstanding-free communication even among users from different language backgrounds.
[0234] Furthermore, after the exchange activity concludes, the server collects evaluations from each participant through an evaluation collection mechanism and stores them in a database. This evaluation data is analyzed by a feedback analysis mechanism and used to improve the quality of future activities. This system provides users with an environment where they can experience activities tailored to their individual needs and effectively practice their foreign language skills.
[0235] As a concrete example, suppose a foreign user learning Japanese participates in a "language exchange event." The server connects this user with a Japanese user via video chat and provides real-time translation between Japanese and the foreign language. After the event, the server uses feedback from both parties to design the next language exchange event more effectively. This is expected to improve participant satisfaction and enhance the quality of learning.
[0236] The following describes the processing flow.
[0237] Step 1:
[0238] The server retrieves information about social activities from the database and provides it to the terminal. Users view this information on their terminal and select activities that interest them.
[0239] Step 2:
[0240] When a user selects an activity they wish to participate in, the device sends a reservation request to the server. The server receives this request and records the reservation information in the database using its reservation management system.
[0241] Step 3:
[0242] As the start time for the interaction activity approaches, the server sends a reminder to the user's device using a notification system. This allows the user to confirm the start of the event in advance.
[0243] Step 4:
[0244] The user checks the notification sent on their device and clicks the link to join the event. The device sends a participation request to the server, and the server uses communication management tools to connect the user to the designated video chat session.
[0245] Step 5:
[0246] The server relays audio and video data between participants in real time. The terminal receives this data and displays it to the user through a video chat interface. If necessary, the server translates the conversation using a generative AI model and sends the result to the terminal for display on the screen.
[0247] Step 6:
[0248] Once the interaction activity concludes, the server displays a feedback form on the terminal. The user enters their evaluation and comments on the activity into this form. The terminal then sends the entered information to the server.
[0249] Step 7:
[0250] The server uses evaluation collection tools to record feedback in a database. Feedback analysis tools are used to analyze the information, and the data will be considered for use in providing better interaction activities.
[0251] (Example 1)
[0252] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0253] In today's increasingly globalized society, overcoming language barriers and facilitating smooth communication among participants are significant challenges. Furthermore, efficiently collecting participant feedback and evaluations to improve the quality of each activity is also essential.
[0254] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0255] In this invention, the server includes an information recording means for storing activity information, a reservation processing means for accepting activity reservations, and a notification means for sending notifications before the start of an activity. This allows participants to smoothly participate in activities regardless of their language, and also enables effective collection and analysis of feedback from participants.
[0256] "Information recording means" refers to a device or configuration for storing and managing multiple activity information items that participants can select.
[0257] "Reservation processing means" refers to a function or process for accepting and recording activity reservations based on participants' choices.
[0258] "Notification means" refers to a method or system for informing participants of information before the start of a scheduled activity.
[0259] "Data management means" refers to interfaces and protocols used to relay and manage communication data between participants.
[0260] "Language conversion means" refers to a system or module for translating participants' utterances and presenting the translation results.
[0261] "Feedback collection methods" refer to techniques or devices for efficiently collecting evaluations and opinions from participants after an activity has concluded.
[0262] A "generative model" refers to an AI-based model or algorithm used to translate participants' speech in real time.
[0263] "Analytical tools" refer to data analysis techniques and methods used to improve the quality of activities based on evaluations from participants.
[0264] This system offers a variety of interactive activities for participants to choose from and includes various functions to ensure their smooth operation. The server has a database for storing activity information and efficiently manages and updates a large amount of activity data through database management software. For example, a database system such as MySQL can be used.
[0265] The server delivers activity information to the user's device via a web server or API in response to the user's request. This allows the user to view and select a variety of activities on their device using a browser or native application (e.g., React Native or Swift).
[0266] When a user wishes to participate in an activity, they make a reservation through their device. The device sends the reservation information to the server, which then records the reservation information in a database. A reservation management system (for example, a system using a RESTful API or RPC) can be used for this process.
[0267] Before an activity begins, the server sends reminder emails or push notifications to users. These notifications utilize email servers (such as Postfix or SendGrid) or push notification services (such as Firebase Cloud Messaging).
[0268] During the event, the devices will provide video and text chat interfaces, enabling smooth communication between users. The server will use protocols such as RTC for communication management to relay audio and video in real time. In particular, for participants who speak different languages, generative AI models (e.g., Google Translate API or Microsoft Azure Translator) will be used to translate their speech in real time, and the translation results will be displayed on the device.
[0269] As a concrete example, consider a foreign user learning Japanese who participates in a "language exchange event." The server connects this user with a Japanese user via video chat and provides real-time translation between Japanese and the foreign language. An example of a prompt might be, "Please explain the process for providing real-time translation between Japanese and English." After the event, feedback is collected from both parties and used to design the next language exchange event more effectively. This allows participants to have a smooth experience of cross-language communication.
[0270] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0271] Step 1:
[0272] The server retrieves activity information from the database using information recording means and distributes it based on requests from the terminal. The input is the user's request data, and the output is the activity information sent to the terminal. Specifically, the server queries the MySQL database to retrieve the necessary information and transmits it to the terminal via a RESTful API.
[0273] Step 2:
[0274] The user views the activity information received on their device and selects the activities they wish to participate in. The input is the displayed activity information, and the output is the activity data selected by the user. This selection data is processed via a user interface using client-side scripting such as JavaScript.
[0275] Step 3:
[0276] The terminal uses a reservation processing mechanism to send reservation information for the selected activity to the server. The input is the user's selected activity data, and the output is the reservation information recorded on the server. Specifically, the terminal sends the user selection information to the server in JSON format, and the server records it directly in the database.
[0277] Step 4:
[0278] The server uses notification methods based on reservation information to send reminders to users before the activity begins. The input is reservation information, and the output is a reminder email or push notification sent to the user. Specifically, the server sends emails using the SMTP protocol or push notifications via the Firebase Cloud Messaging API.
[0279] Step 5:
[0280] The terminal uses communication management means to provide video chat and text chat interfaces, enabling communication between users. Input is user voice and text data, and output is real-time video and audio data provided to other users. Using the RTC protocol, the server relays the audio and video data.
[0281] Step 6:
[0282] The server applies the generative AI model as a language conversion means, translates the speech content of the participants in real time, and displays it on the terminal. The input is the speech data of the user, and the output is the translated text data. It includes the specific operations for the server to perform translation through the Google Translate API.
[0283] Step 7:
[0284] After the activity ends, the server uses the feedback collection means to collect evaluations from the participants and utilize them to improve the quality of the next activity. The input is the evaluation data of the participants, and the output is the analyzed feedback data. Specifically, evaluations are collected through a questionnaire form and analyzed using an analysis program.
[0285] (Application Example 1)
[0286] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".
[0287] In online communication activities, when participants use different languages, there is a language barrier, making smooth real-time communication difficult. Also, there is a need for a method to effectively collect and analyze feedback to improve the quality of the activity content.
[0288] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following respective means.
[0289] In this invention, the server includes information storage means for storing a plurality of communication activity information that participants can select, communication management means for relaying voice and video data between participants, and translation presentation means for translating the speech of participants and presenting the translation results in real time. Thereby, smooth real-time communication is possible even among participants using different languages. And after the communication activity ends, evaluations from the participants can be collected, and the quality of the activity can be improved based on the feedback.
[0290] "Information storage means" refers to a memory device that holds information on interaction activities that participants can select and provides as needed.
[0291] A "reservation management system" is a means that has the function of accepting and managing reservations for exchange activities based on the participants' choices.
[0292] "Notification method" refers to a means of sending reminders or notifications to participants in advance before the start of a scheduled social activity.
[0293] A "communication management system" is a means that has management functions to relay audio and video data between participants in real time and to enable smooth communication.
[0294] A "translation presentation method" is a means of translating participants' utterances and presenting the translation results quickly and accurately.
[0295] "Evaluation collection methods" refer to methods for gathering feedback from participants after an exchange activity has concluded, in order to collect information that can be used to improve future activities.
[0296] "Improvement measures" refer to methods for taking steps to improve the quality of interaction activities based on feedback received from participants.
[0297] A "generative model" is a machine learning model used to translate participants' speech in real time.
[0298] To implement this invention, a system is constructed in which a server and a user terminal work together to perform their functions. The server first stores information on multiple interaction activities in a database using an information storage means. This information is used when participants access it on their terminals and select activities of interest. The selected activities are recorded through a reservation management means based on requests sent from the terminals, and the server manages the reservation information.
[0299] Before a scheduled activity begins, the server sends a reminder to participants using a notification system. This ensures that participants do not miss the timing of the activity.
[0300] During the activity, the user's device collects audio and video data and relays it in real time with other participants via a communication management system. For video chat, WebRTC technology is used to ensure reliable audio and video delivery. For translation presentation, the Google Translate API is utilized, and a machine learning model (generative AI model) is used to translate the user's speech in real time. The resulting translation is immediately displayed on the participant's device, enabling smooth communication even among participants using different languages.
[0301] After the activity concludes, the server uses evaluation tools to collect feedback from each participant. Then, using improvement tools, this feedback is analyzed and incorporated into future activities to enhance their quality.
[0302] As a concrete example, let's say a foreign user learning Japanese participates in an online workshop about Japanese culture. In this case, the server connects him to a Japanese-speaking instructor via video chat and provides real-time translation between Japanese and his native language. After the event ends, the user can submit feedback on topics they would like to see covered in the next workshop. For example, they could submit a prompt such as, "Please tell me what topics you would like to see covered in the next Japanese culture workshop. Examples: Tea ceremony, how to make bonsai."
[0303] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0304] Step 1:
[0305] The user accesses the server using a terminal and obtains the communication activity information stored in the information storage means. Receiving the user's request as input, the server filters the corresponding activity information from the database and presents it to the terminal as output. Thereby, the user can select an activity of interest.
[0306] Step 2:
[0307] The user sends a reservation request from the terminal for the selected activity. Receiving the selection information as input, the server uses the reservation management means to combine the activity information and user information and record the reservation in the database. Thereby, the user is notified that the reservation has been completed.
[0308] Step 3:
[0309] Before the start of the reserved activity, the server uses the notification means and sends a reminder to the user's terminal using the reservation information as input. As output, a notification including the details and time of the activity is delivered to the user. Thereby, the user can participate in the activity without forgetting.
[0310] Step 4:
[0311] During the activity, the user terminal collects the voices and videos of the participants in real time. Receiving data from the terminal microphone and camera as input, it relays through the server to other participants using the communication management means. As output, voice and video data reach other participants on the video chat platform.
[0312] Step 5:
[0313] By means of the translation prompting means, the server translates the user's speech in real time. Converting the user's voice data into text as input and sending it to the generated AI model. The translation API is used for data calculation, and text converted into a different language is presented to the user terminal as output. Thereby, smooth communication between participants in different languages becomes possible.
[0314] Step 6:
[0315] After the interaction activity ends, users send feedback to the server via their devices. The server receives evaluation data as input and records it in a database using an evaluation collection method. As output, feedback data is collected to help improve future activities.
[0316] Step 7:
[0317] The server analyzes the collected feedback using improvement methods. Using evaluation data as input, it extracts areas for improvement to enhance the quality of the activity through data analysis techniques. The output is a plan for improvement in the next activity. This leads to improved activity quality and increased participant satisfaction.
[0318] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0319] Embodiments of this invention consist of a system having various functions to enhance user-participatory social activities. First, the server provides social activity information stored in a database to the user's terminal, and the user can view this information via the terminal and select activities of interest. The reservation management means allows the server to accept reservations for the selected activities and record them in the database.
[0320] Furthermore, by incorporating an emotion engine, it becomes possible to recognize the user's emotions from their speech and captured facial expressions. The server analyzes the output of the emotion engine and has control mechanisms to adjust the progress of the interaction activity as needed. For example, if a participant is detected as being nervous, the system can adjust the pace of the activity or send encouraging messages.
[0321] When users participate in interaction activities, the server uses communication management means to relay audio and video data in real time, and further provides real-time translations as needed using translation presentation means, thereby facilitating smooth communication even among users with different language backgrounds. A generative model is used for translation, providing fast and accurate translation results.
[0322] After the interaction activity concludes, the server displays a feedback form on the terminal to collect user feedback. Furthermore, emotional data collected by the emotion engine is also evaluated, and this data is analyzed using data analysis tools. The resulting data is used to improve the quality of future interaction activities, thereby enhancing the user's learning effectiveness.
[0323] For example, if a Japanese user learning English and an English-speaking user learning Japanese participate in a language exchange activity, the server connects the users via video chat, and the emotion engine recognizes emotions from the participants' facial expressions. If the server determines that the users are enjoying themselves, it maintains the activity to ensure that situation continues. On the other hand, if the server determines that either user is confused, it adaptively controls the interaction, such as by enhancing the translation presentation to support conversation comprehension. This makes it possible to maintain motivation for language learning and maximize practical learning effectiveness.
[0324] The following describes the processing flow.
[0325] Step 1:
[0326] The server retrieves interaction activity information from the database and sends it to the user's terminal. The user then browses the activity information provided on their terminal and selects activities that interest them.
[0327] Step 2:
[0328] When a user selects an activity they wish to participate in, the device sends a reservation request to the server. The server uses a reservation management system to record this information in a database and sends a reservation confirmation notification to the device.
[0329] Step 3:
[0330] Before the interaction activity begins, the server uses a notification system to send a reminder to the user's device informing them of the start time. The user receives the notification and prepares for the activity.
[0331] Step 4:
[0332] After the user confirms the notification and indicates their intention to participate in the activity via their device, the device sends a participation request to the server. The server connects the user to the video chat session via a communication management system.
[0333] Step 5:
[0334] During a video chat, the server uses a generative model to translate speech in real time and displays the results on the terminal via a translation display device. The server also operates an emotion engine to analyze the user's emotions from their speech and facial expressions. Based on this emotion data, the server adjusts the flow of the interaction as needed.
[0335] Step 6:
[0336] Once the interaction activity is complete, the server instructs the device to display a feedback form. The user enters their evaluation and comments on the activity into the form, and the device sends it to the server.
[0337] Step 7:
[0338] The server uses an evaluation collection mechanism to store user feedback in a database, and then uses a feedback analysis mechanism to analyze all the data, including the results of the emotion engine's analysis. Based on the analysis results, it plans future activities and system improvements.
[0339] (Example 2)
[0340] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0341] Traditional interaction activity systems often suffered from difficulties in smooth communication among participants due to differences in language and emotions, resulting in reduced effectiveness. Furthermore, the lack of mechanisms to dynamically adjust activity content made it challenging to achieve sufficient participant satisfaction. Additionally, post-activity evaluations were often managed merely as numerical data, failing to effectively utilize feedback for future improvements.
[0342] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0343] In this invention, the server includes data storage means for storing information on multiple interaction activities that participants can select, reservation acceptance means for accepting reservations for interaction activities based on the participant's selection, and emotion analysis and adjustment means for analyzing the emotions of participants during interaction activities and dynamically adjusting the content of the activities. This makes it possible to provide flexible activities that respond to language differences and changes in emotions.
[0344] A "data storage means" is a medium or device that stores information on multiple interaction activities that participants can select, and allows them to access it as needed.
[0345] A "reservation acceptance method" refers to a medium or function for accepting reservations for exchange activities based on participants' choices.
[0346] "Information transmission means" refers to functions or devices used to send notifications to participants before the start of a scheduled exchange activity.
[0347] "Data transmission means" refers to technologies and devices for relaying audio and video data between participants.
[0348] "Translation provision means" refers to a function or mechanism for translating participants' utterances and presenting the translation results.
[0349] "Emotional analysis and adjustment means" refers to methods or devices for analyzing the emotions of participants during interaction activities and dynamically adjusting the content of those activities based on the results.
[0350] "Data collection and analysis means" refers to a system or process for collecting evaluations from participants after the completion of an interaction activity and analyzing them together with the participants' emotional data.
[0351] A "generative model" is a machine learning model or algorithm used to translate participants' speech in real time.
[0352] As an embodiment of this invention, a system for more effectively facilitating interaction activities is provided. The server primarily utilizes data storage means, reservation acceptance means, sentiment analysis and adjustment means, data transmission means, translation provision means, and data collection and analysis means to facilitate smooth communication among participants. Specifically, this system operates as follows.
[0353] The server first uses data storage means to store information related to the interaction activities in a database. The hardware used here includes cloud storage systems and relational database systems. This information is stored in a way that makes it accessible to participants.
[0354] When a participant selects an activity, the device sends this information to the server using a reservation acceptance mechanism. The device provides a user interface via a web browser or mobile application. The reserved information is recorded in the server's database.
[0355] Once the interaction begins, the server uses data transmission methods to relay participants' audio and video data in real time. The software used for this can leverage the APIs of common video conferencing tools.
[0356] During the activity, the server utilizes emotion analysis and adjustment tools, using a generative AI model to analyze participants' speech and facial expression data. The software used includes machine learning algorithms that read emotions. This allows the system to determine whether participants are relaxed or tense, and adjust the pace and content of the activity accordingly.
[0357] The server also uses translation tools to translate participants' different languages in real time. The translation service, provided by a generative AI model, is highly accurate and fast.
[0358] Once the interaction activity concludes, the device collects and analyzes feedback from participants via data collection and analysis tools and sends it to the server. This feedback, along with sentiment analysis data, is analyzed in a way that helps improve future interaction activities.
[0359] As a concrete example, a Japanese user might participate with the goal of learning English, while an English-speaking user might participate in the exchange activities to learn Japanese. The server connects each user via video chat, and the emotion engine analyzes the facial expressions of both. In this case, an example of a prompt sentence to be input into the generating AI model would be, "Please suggest a way to adjust the pace of the activity based on the emotion data obtained from User A's facial expressions."
[0360] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0361] Step 1:
[0362] The server retrieves interaction activity information from the database using data storage means and sends it to the terminal. This information includes the type of activity, date and time, and location. When a user logs in and accesses the system, the terminal provides a user interface that displays this information. The user selects activities of interest as user input, and the terminal sends this selection to the server. The output is the user's selection information, which is stored on the server.
[0363] Step 2:
[0364] The terminal sends a reservation request for an activity to the server based on the user's selection. The server records the reservation information in a database using a reservation acceptance mechanism. The input is the reservation request, which includes the user ID and the selected activity information. The server output is the reservation status recorded in the database. This allows the user to confirm whether their reservation was successful.
[0365] Step 3:
[0366] The server utilizes a video conferencing system to relay audio and video data between participants in real time using data transmission means. When a user joins an activity, the terminal sends and receives this media data, providing the user with live video and audio. The input is audio and video data from the user, and the output is audio and video data transmitted to other participants.
[0367] Step 4:
[0368] The server utilizes emotion analysis and adjustment mechanisms, using a generative AI model to analyze participants' statements and captured facial expressions. This allows for real-time understanding of the user's emotional state. The input is participant's statements and facial expression data, and the output is analyzed emotional information. Specifically, if the server determines that a participant is tense, it may slow down the pace of the activity or send reassuring messages.
[0369] Step 5:
[0370] The server utilizes a translation service and uses a generative AI model to translate participants' utterances in real time. The input is participant utterance data, which may include different languages. The output is the translated text. The terminal displays this text in a user interface to help participants understand the meaning.
[0371] Step 6:
[0372] After the interaction activity concludes, the terminal uses data collection and analysis tools to obtain feedback from participants and transmit it to the server. The input consists of user evaluations and impressions. The server collects this data and analyzes it, along with sentiment data, to identify areas for improvement for future activities. The output is improvement suggestions based on the analysis, thereby enhancing the system's service.
[0373] (Application Example 2)
[0374] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0375] In online or offline interaction activities, participants may face difficulties in smooth communication due to differing language backgrounds and emotional states. Therefore, there is a need for systems that support adaptive communication tailored to participants' language and emotional states. Furthermore, improving the quality of future activities based on participant feedback and emotional data is a key challenge.
[0376] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0377] In this invention, the server includes information storage means for storing multiple interaction activity information that participants can select, emotion recognition means for analyzing the participant's facial expression information to recognize emotions, and activity adjustment means for adaptively controlling activities according to the participant's emotions. This enables smooth and effective interaction activities by providing appropriate support based on the participant's language and emotional state.
[0378] "Information storage means" refers to a device or function that stores and manages information on multiple interaction activities that participants can select.
[0379] A "reservation management system" is a function that accepts and manages reservations for social activities based on the participants' choices.
[0380] "Notification means" refers to a system or function for sending notifications to participants before the start of a scheduled social activity.
[0381] "Communication management means" refers to a function or device for relaying audio and video data between participants.
[0382] "Translation presentation means" refers to a function or device for translating a participant's utterance and presenting the translation result.
[0383] "Evaluation collection methods" refer to functions or systems for collecting evaluations from participants after an exchange activity has concluded.
[0384] "Emotion recognition means" refers to technologies and functions that analyze participants' facial expressions to recognize their emotions.
[0385] "Activity adjustment means" refers to a function or system for adaptively controlling activities in accordance with the emotions of the participants.
[0386] This invention is a system for improving the user experience when participating in social activities, and has multiple functions. The server uses information storage means to store and manage information on social activities that participants can select. When a user selects an activity through a terminal, the reservation management means accepts the reservation based on that selection. Subsequently, notification means appropriately sends a notification to the user before the start of the social activity.
[0387] To ensure smooth communication, the server uses communication management means to relay audio and video data between users in real time. Furthermore, a translation presentation means is used to translate participants' speech in real time using a generation AI model, and the translation results are presented to the users. This enables effective communication even among participants with different language backgrounds.
[0388] To improve user satisfaction, the server analyzes the user's facial expressions using emotion recognition tools and recognizes their emotions. Based on this data, activity adjustment tools adaptively control interaction activities according to the user's emotions, adjusting the pace as needed and providing encouraging messages to ensure a comfortable experience for participants.
[0389] After the activity concludes, the server uses evaluation tools to collect user feedback, and also evaluates emotional data obtained through emotion recognition tools. This data is then used to improve the quality of future interaction activities.
[0390] As a concrete example, in the case of a smart communication robot, this robot is equipped with a small camera to detect the user's smile or confused expression and analyzes their emotions using the Google Cloud Vision API. Based on the analysis results, the robot utilizes the DeepL API to quickly translate the user's speech and supports the real-time transmission of the translated statement to the other party. An example of a prompt in a generative AI model would be a specific instruction such as, "Analyze the user's emotions based on their facial expressions during the conversation. If the expression indicates tension or stress, provide a message of encouragement."
[0391] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0392] Step 1:
[0393] The server stores interaction activity information in a database using information storage means. When a user accesses the activity information via a terminal, the server provides information in response to the request, and the user selects activities of interest. The input at this time is the user's selection information, and the output is information about the activities corresponding to that selection.
[0394] Step 2:
[0395] The server accepts reservations for selected activities using a reservation management system. The input is the user's selection information, which is recorded in the database to complete the reservation. The output is a reservation completion notification sent to the user.
[0396] Step 3:
[0397] The server uses a notification mechanism to inform the user's terminal of the start time of the scheduled activity before the activity begins. The input to this process is the activity start time information, and the output is a notification message to the user.
[0398] Step 4:
[0399] The server uses communication management means to relay audio and video data between users in real time. The input to this process is the user's audio and video data, and the output is to provide this data to other participants as relayed data in real time.
[0400] Step 5:
[0401] The server uses a translation presentation mechanism to translate participants' speech in real time using a generative AI model. The input is user voice data, which is output as translated text through data conversion. The generative AI model provides fast and accurate translation.
[0402] Step 6:
[0403] The server uses emotion recognition technology to analyze the user's facial expressions and recognize their emotions. The input for this process is image data acquired from the camera, which is output as emotion data using the Google Cloud Vision API.
[0404] Step 7:
[0405] The server uses activity adjustment mechanisms to adjust activities based on recognized emotions. Emotional data is the input, and based on this, the server adjusts the pace of activities and provides encouraging messages to the user as output.
[0406] Step 8:
[0407] After the interaction activity concludes, the server uses an evaluation collection mechanism to gather feedback from users. In this process, the input is user evaluation information, which is then aggregated and output as data to help improve the quality of future activities.
[0408] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0409] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0410] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0411] [Third Embodiment]
[0412] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0413] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0414] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0415] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0416] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0417] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0418] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0419] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0420] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0421] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0422] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0423] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0424] The system for implementing this invention provides participants with selectable interaction activities and includes various functions for the smooth operation of those activities. Its specific form is described below.
[0425] First, the server maintains multiple pieces of interaction activity information stored in a database and distributes this activity information in response to requests from terminals. This allows users to view various activities on their terminals and select those that interest them.
[0426] Users make reservations for the social activities they wish to participate in using their devices. The devices send this information to the server, which records it in a database using a reservation management system. Based on this record, the server sends a reminder to the user using a notification system before the activity begins. This ensures that users do not forget to participate in the activity and can take part in events in a timely manner.
[0427] During the event, the devices will provide video and text chat interfaces to facilitate smooth communication between users. A communication management system will relay audio and video data in real time via a server, enabling seamless interaction even among participants in remote locations. When necessary, the server will utilize a generative AI model to translate user speech in real time and display the results on the device through a translation display system. This ensures smooth and misunderstanding-free communication even among users from different language backgrounds.
[0428] Furthermore, after the exchange activity concludes, the server collects evaluations from each participant through an evaluation collection mechanism and stores them in a database. This evaluation data is analyzed by a feedback analysis mechanism and used to improve the quality of future activities. This system provides users with an environment where they can experience activities tailored to their individual needs and effectively practice their foreign language skills.
[0429] As a concrete example, suppose a foreign user learning Japanese participates in a "language exchange event." The server connects this user with a Japanese user via video chat and provides real-time translation between Japanese and the foreign language. After the event, the server uses feedback from both parties to design the next language exchange event more effectively. This is expected to improve participant satisfaction and enhance the quality of learning.
[0430] The following describes the processing flow.
[0431] Step 1:
[0432] The server retrieves information about social activities from the database and provides it to the terminal. Users view this information on their terminal and select activities that interest them.
[0433] Step 2:
[0434] When a user selects an activity they wish to participate in, the device sends a reservation request to the server. The server receives this request and records the reservation information in the database using its reservation management system.
[0435] Step 3:
[0436] As the start time for the interaction activity approaches, the server sends a reminder to the user's device using a notification system. This allows the user to confirm the start of the event in advance.
[0437] Step 4:
[0438] The user checks the notification sent on their device and clicks the link to join the event. The device sends a participation request to the server, and the server uses communication management tools to connect the user to the designated video chat session.
[0439] Step 5:
[0440] The server relays audio and video data between participants in real time. The terminal receives this data and displays it to the user through a video chat interface. If necessary, the server translates the conversation using a generative AI model and sends the result to the terminal for display on the screen.
[0441] Step 6:
[0442] Once the interaction activity concludes, the server displays a feedback form on the terminal. The user enters their evaluation and comments on the activity into this form. The terminal then sends the entered information to the server.
[0443] Step 7:
[0444] The server uses evaluation collection tools to record feedback in a database. Feedback analysis tools are used to analyze the information, and the data will be considered for use in providing better interaction activities.
[0445] (Example 1)
[0446] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0447] In today's increasingly globalized society, overcoming language barriers and facilitating smooth communication among participants are significant challenges. Furthermore, efficiently collecting participant feedback and evaluations to improve the quality of each activity is also essential.
[0448] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0449] In this invention, the server includes an information recording means for storing activity information, a reservation processing means for accepting activity reservations, and a notification means for sending notifications before the start of an activity. This allows participants to smoothly participate in activities regardless of their language, and also enables effective collection and analysis of feedback from participants.
[0450] "Information recording means" refers to a device or configuration for storing and managing multiple activity information items that participants can select.
[0451] "Reservation processing means" refers to a function or process for accepting and recording activity reservations based on participants' choices.
[0452] "Notification means" refers to a method or system for informing participants of information before the start of a scheduled activity.
[0453] "Data management means" refers to interfaces and protocols used to relay and manage communication data between participants.
[0454] "Language conversion means" refers to a system or module for translating participants' utterances and presenting the translation results.
[0455] "Feedback collection methods" refer to techniques or devices for efficiently collecting evaluations and opinions from participants after an activity has concluded.
[0456] A "generative model" refers to an AI-based model or algorithm used to translate participants' speech in real time.
[0457] "Analytical tools" refer to data analysis techniques and methods used to improve the quality of activities based on evaluations from participants.
[0458] This system offers a variety of interactive activities for participants to choose from and includes various functions to ensure their smooth operation. The server has a database for storing activity information and efficiently manages and updates a large amount of activity data through database management software. For example, a database system such as MySQL can be used.
[0459] The server delivers activity information to the user's device via a web server or API in response to the user's request. This allows the user to view and select a variety of activities on their device using a browser or native application (e.g., React Native or Swift).
[0460] When a user wishes to participate in an activity, they make a reservation through their device. The device sends the reservation information to the server, which then records the reservation information in a database. A reservation management system (for example, a system using a RESTful API or RPC) can be used for this process.
[0461] Before an activity begins, the server sends reminder emails or push notifications to users. These notifications utilize email servers (such as Postfix or SendGrid) or push notification services (such as Firebase Cloud Messaging).
[0462] During the event, the devices will provide video and text chat interfaces, enabling smooth communication between users. The server will use protocols such as RTC for communication management to relay audio and video in real time. In particular, for participants who speak different languages, generative AI models (e.g., Google Translate API or Microsoft Azure Translator) will be used to translate their speech in real time, and the translation results will be displayed on the device.
[0463] As a concrete example, consider a foreign user learning Japanese who participates in a "language exchange event." The server connects this user with a Japanese user via video chat and provides real-time translation between Japanese and the foreign language. An example of a prompt might be, "Please explain the process for providing real-time translation between Japanese and English." After the event, feedback is collected from both parties and used to design the next language exchange event more effectively. This allows participants to have a smooth experience of cross-language communication.
[0464] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0465] Step 1:
[0466] The server retrieves activity information from the database using information recording means and distributes it based on requests from the terminal. The input is the user's request data, and the output is the activity information sent to the terminal. Specifically, the server queries the MySQL database to retrieve the necessary information and transmits it to the terminal via a RESTful API.
[0467] Step 2:
[0468] The user views the activity information received on their device and selects the activities they wish to participate in. The input is the displayed activity information, and the output is the activity data selected by the user. This selection data is processed via a user interface using client-side scripting such as JavaScript.
[0469] Step 3:
[0470] The terminal uses a reservation processing mechanism to send reservation information for the selected activity to the server. The input is the user's selected activity data, and the output is the reservation information recorded on the server. Specifically, the terminal sends the user selection information to the server in JSON format, and the server records it directly in the database.
[0471] Step 4:
[0472] The server uses notification methods based on reservation information to send reminders to users before the activity begins. The input is reservation information, and the output is a reminder email or push notification sent to the user. Specifically, the server sends emails using the SMTP protocol or push notifications via the Firebase Cloud Messaging API.
[0473] Step 5:
[0474] The terminal uses communication management means to provide video chat and text chat interfaces, enabling communication between users. Input is user voice and text data, and output is real-time video and audio data provided to other users. Using the RTC protocol, the server relays the audio and video data.
[0475] Step 6:
[0476] The server applies a generative AI model as a language conversion method, translating participants' speech in real time and displaying it on the terminal. The input is the user's speech data, and the output is translated text data. This includes the specific actions the server takes when performing translation via the Google Translate API.
[0477] Step 7:
[0478] The server collects feedback from participants after the activity using a feedback collection system, and uses this information to improve the quality of future activities. The input is participant evaluation data, and the output is analyzed feedback data. Specifically, evaluations are collected through a survey form and analyzed using an analysis program.
[0479] (Application Example 1)
[0480] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0481] In online exchange activities, language barriers exist when participants use different languages, making smooth real-time communication difficult. Furthermore, there is a need for effective methods to collect and analyze feedback to improve the quality of the activities.
[0482] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0483] In this invention, the server includes an information storage means for storing multiple exchange activity information that participants can select, a communication management means for relaying audio and video data between participants, and a translation presentation means for translating participants' speech and presenting the translation results in real time. This enables smooth, real-time communication even among participants who speak different languages. Furthermore, evaluations from participants can be collected after the exchange activity ends, and the quality of the activity can be improved based on the feedback.
[0484] "Information storage means" refers to a memory device that holds information on interaction activities that participants can select and provides as needed.
[0485] A "reservation management system" is a means that has the function of accepting and managing reservations for exchange activities based on the participants' choices.
[0486] "Notification method" refers to a means of sending reminders or notifications to participants in advance before the start of a scheduled social activity.
[0487] A "communication management system" is a means that has management functions to relay audio and video data between participants in real time and to enable smooth communication.
[0488] A "translation presentation method" is a means of translating participants' utterances and presenting the translation results quickly and accurately.
[0489] "Evaluation collection methods" refer to methods for gathering feedback from participants after an exchange activity has concluded, in order to collect information that can be used to improve future activities.
[0490] "Improvement measures" refer to methods for taking steps to improve the quality of interaction activities based on feedback received from participants.
[0491] A "generative model" is a machine learning model used to translate participants' speech in real time.
[0492] To implement this invention, a system is constructed in which a server and a user terminal work together to perform their functions. The server first stores information on multiple interaction activities in a database using an information storage means. This information is used when participants access it on their terminals and select activities of interest. The selected activities are recorded through a reservation management means based on requests sent from the terminals, and the server manages the reservation information.
[0493] Before a scheduled activity begins, the server sends a reminder to participants using a notification system. This ensures that participants do not miss the timing of the activity.
[0494] During the activity, the user's device collects audio and video data and relays it in real time with other participants via a communication management system. For video chat, WebRTC technology is used to ensure reliable audio and video delivery. For translation presentation, the Google Translate API is utilized, and a machine learning model (generative AI model) is used to translate the user's speech in real time. The resulting translation is immediately displayed on the participant's device, enabling smooth communication even among participants using different languages.
[0495] After the activity concludes, the server uses evaluation tools to collect feedback from each participant. Then, using improvement tools, this feedback is analyzed and incorporated into future activities to enhance their quality.
[0496] As a concrete example, let's say a foreign user learning Japanese participates in an online workshop about Japanese culture. In this case, the server connects him to a Japanese-speaking instructor via video chat and provides real-time translation between Japanese and his native language. After the event ends, the user can submit feedback on topics they would like to see covered in the next workshop. For example, they could submit a prompt such as, "Please tell me what topics you would like to see covered in the next Japanese culture workshop. Examples: Tea ceremony, how to make bonsai."
[0497] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0498] Step 1:
[0499] The user accesses the server using a terminal and retrieves interaction activity information stored in the information storage device. The server receives the user's request as input, filters the corresponding activity information from the database, and presents it to the terminal as output. This allows the user to select activities of interest.
[0500] Step 2:
[0501] The user sends a reservation request from their device for the selected activity. The server, receiving the selection information as input, uses a reservation management system to combine the activity information and user information and record the reservation in the database. The user is then notified that the reservation is complete.
[0502] Step 3:
[0503] The server uses a notification system before the scheduled activity begins, sending a reminder to the user's device using the reservation information as input. The user receives a notification containing activity details and time as output, ensuring they don't forget to participate.
[0504] Step 4:
[0505] During the activity, the user's device collects participants' audio and video in real time. It receives data from the device's microphone and camera as input, and relays it to other participants via a server using a communication management system. As output, the audio and video data are sent to other participants on the video chat platform.
[0506] Step 5:
[0507] The translation presentation system allows the server to translate user speech in real time. It converts user voice data into text as input and sends it to a generation AI model. A translation API is used for data calculation, and the output—text translated into different languages—is presented to the user's terminal. This enables smooth communication among participants, even in different languages.
[0508] Step 6:
[0509] After the interaction activity ends, users send feedback to the server via their devices. The server receives evaluation data as input and records it in a database using an evaluation collection method. As output, feedback data is collected to help improve future activities.
[0510] Step 7:
[0511] The server analyzes the collected feedback using improvement methods. Using evaluation data as input, it extracts areas for improvement to enhance the quality of the activity through data analysis techniques. The output is a plan for improvement in the next activity. This leads to improved activity quality and increased participant satisfaction.
[0512] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0513] Embodiments of this invention consist of a system having various functions to enhance user-participatory social activities. First, the server provides social activity information stored in a database to the user's terminal, and the user can view this information via the terminal and select activities of interest. The reservation management means allows the server to accept reservations for the selected activities and record them in the database.
[0514] Furthermore, by incorporating an emotion engine, it becomes possible to recognize the user's emotions from their speech and captured facial expressions. The server analyzes the output of the emotion engine and has control mechanisms to adjust the progress of the interaction activity as needed. For example, if a participant is detected as being nervous, the system can adjust the pace of the activity or send encouraging messages.
[0515] When users participate in interaction activities, the server uses communication management means to relay audio and video data in real time, and further provides real-time translations as needed using translation presentation means, thereby facilitating smooth communication even among users with different language backgrounds. A generative model is used for translation, providing fast and accurate translation results.
[0516] After the interaction activity concludes, the server displays a feedback form on the terminal to collect user feedback. Furthermore, emotional data collected by the emotion engine is also evaluated, and this data is analyzed using data analysis tools. The resulting data is used to improve the quality of future interaction activities, thereby enhancing the user's learning effectiveness.
[0517] For example, if a Japanese user learning English and an English-speaking user learning Japanese participate in a language exchange activity, the server connects the users via video chat, and the emotion engine recognizes emotions from the participants' facial expressions. If the server determines that the users are enjoying themselves, it maintains the activity to ensure that situation continues. On the other hand, if the server determines that either user is confused, it adaptively controls the interaction, such as by enhancing the translation presentation to support conversation comprehension. This makes it possible to maintain motivation for language learning and maximize practical learning effectiveness.
[0518] The following describes the processing flow.
[0519] Step 1:
[0520] The server retrieves interaction activity information from the database and sends it to the user's terminal. The user then browses the activity information provided on their terminal and selects activities that interest them.
[0521] Step 2:
[0522] When a user selects an activity they wish to participate in, the device sends a reservation request to the server. The server uses a reservation management system to record this information in a database and sends a reservation confirmation notification to the device.
[0523] Step 3:
[0524] Before the interaction activity begins, the server uses a notification system to send a reminder to the user's device informing them of the start time. The user receives the notification and prepares for the activity.
[0525] Step 4:
[0526] After the user confirms the notification and indicates their intention to participate in the activity via their device, the device sends a participation request to the server. The server connects the user to the video chat session via a communication management system.
[0527] Step 5:
[0528] During a video chat, the server uses a generative model to translate speech in real time and displays the results on the terminal via a translation display device. The server also operates an emotion engine to analyze the user's emotions from their speech and facial expressions. Based on this emotion data, the server adjusts the flow of the interaction as needed.
[0529] Step 6:
[0530] Once the interaction activity is complete, the server instructs the device to display a feedback form. The user enters their evaluation and comments on the activity into the form, and the device sends it to the server.
[0531] Step 7:
[0532] The server uses an evaluation collection mechanism to store user feedback in a database, and then uses a feedback analysis mechanism to analyze all the data, including the results of the emotion engine's analysis. Based on the analysis results, it plans future activities and system improvements.
[0533] (Example 2)
[0534] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0535] Traditional interaction activity systems often suffered from difficulties in smooth communication among participants due to differences in language and emotions, resulting in reduced effectiveness. Furthermore, the lack of mechanisms to dynamically adjust activity content made it challenging to achieve sufficient participant satisfaction. Additionally, post-activity evaluations were often managed merely as numerical data, failing to effectively utilize feedback for future improvements.
[0536] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0537] In this invention, the server includes data storage means for storing information on multiple interaction activities that participants can select, reservation acceptance means for accepting reservations for interaction activities based on the participant's selection, and emotion analysis and adjustment means for analyzing the emotions of participants during interaction activities and dynamically adjusting the content of the activities. This makes it possible to provide flexible activities that respond to language differences and changes in emotions.
[0538] A "data storage means" is a medium or device that stores information on multiple interaction activities that participants can select, and allows them to access it as needed.
[0539] A "reservation acceptance method" refers to a medium or function for accepting reservations for exchange activities based on participants' choices.
[0540] "Information transmission means" refers to functions or devices used to send notifications to participants before the start of a scheduled exchange activity.
[0541] "Data transmission means" refers to technologies and devices for relaying audio and video data between participants.
[0542] "Translation provision means" refers to a function or mechanism for translating participants' utterances and presenting the translation results.
[0543] "Emotional analysis and adjustment means" refers to methods or devices for analyzing the emotions of participants during interaction activities and dynamically adjusting the content of those activities based on the results.
[0544] "Data collection and analysis means" refers to a system or process for collecting evaluations from participants after the completion of an interaction activity and analyzing them together with the participants' emotional data.
[0545] A "generative model" is a machine learning model or algorithm used to translate participants' speech in real time.
[0546] As an embodiment of this invention, a system for more effectively facilitating interaction activities is provided. The server primarily utilizes data storage means, reservation acceptance means, sentiment analysis and adjustment means, data transmission means, translation provision means, and data collection and analysis means to facilitate smooth communication among participants. Specifically, this system operates as follows.
[0547] The server first uses data storage means to store information related to the interaction activities in a database. The hardware used here includes cloud storage systems and relational database systems. This information is stored in a way that makes it accessible to participants.
[0548] When a participant selects an activity, the device sends this information to the server using a reservation acceptance mechanism. The device provides a user interface via a web browser or mobile application. The reserved information is recorded in the server's database.
[0549] Once the interaction begins, the server uses data transmission methods to relay participants' audio and video data in real time. The software used for this can leverage the APIs of common video conferencing tools.
[0550] During the activity, the server utilizes emotion analysis and adjustment tools, using a generative AI model to analyze participants' speech and facial expression data. The software used includes machine learning algorithms that read emotions. This allows the system to determine whether participants are relaxed or tense, and adjust the pace and content of the activity accordingly.
[0551] The server also uses translation tools to translate participants' different languages in real time. The translation service, provided by a generative AI model, is highly accurate and fast.
[0552] Once the interaction activity concludes, the device collects and analyzes feedback from participants via data collection and analysis tools and sends it to the server. This feedback, along with sentiment analysis data, is analyzed in a way that helps improve future interaction activities.
[0553] As a concrete example, a Japanese user might participate with the goal of learning English, while an English-speaking user might participate in the exchange activities to learn Japanese. The server connects each user via video chat, and the emotion engine analyzes the facial expressions of both. In this case, an example of a prompt sentence to be input into the generating AI model would be, "Please suggest a way to adjust the pace of the activity based on the emotion data obtained from User A's facial expressions."
[0554] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0555] Step 1:
[0556] The server retrieves interaction activity information from the database using data storage means and sends it to the terminal. This information includes the type of activity, date and time, and location. When a user logs in and accesses the system, the terminal provides a user interface that displays this information. The user selects activities of interest as user input, and the terminal sends this selection to the server. The output is the user's selection information, which is stored on the server.
[0557] Step 2:
[0558] The terminal sends a reservation request for an activity to the server based on the user's selection. The server records the reservation information in a database using a reservation acceptance mechanism. The input is the reservation request, which includes the user ID and the selected activity information. The server output is the reservation status recorded in the database. This allows the user to confirm whether their reservation was successful.
[0559] Step 3:
[0560] The server utilizes a video conferencing system to relay audio and video data between participants in real time using data transmission means. When a user joins an activity, the terminal sends and receives this media data, providing the user with live video and audio. The input is audio and video data from the user, and the output is audio and video data transmitted to other participants.
[0561] Step 4:
[0562] The server utilizes emotion analysis and adjustment mechanisms, using a generative AI model to analyze participants' statements and captured facial expressions. This allows for real-time understanding of the user's emotional state. The input is participant's statements and facial expression data, and the output is analyzed emotional information. Specifically, if the server determines that a participant is tense, it may slow down the pace of the activity or send reassuring messages.
[0563] Step 5:
[0564] The server utilizes a translation service and uses a generative AI model to translate participants' utterances in real time. The input is participant utterance data, which may include different languages. The output is the translated text. The terminal displays this text in a user interface to help participants understand the meaning.
[0565] Step 6:
[0566] After the interaction activity concludes, the terminal uses data collection and analysis tools to obtain feedback from participants and transmit it to the server. The input consists of user evaluations and impressions. The server collects this data and analyzes it, along with sentiment data, to identify areas for improvement for future activities. The output is improvement suggestions based on the analysis, thereby enhancing the system's service.
[0567] (Application Example 2)
[0568] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0569] In online or offline interaction activities, participants may face difficulties in smooth communication due to differing language backgrounds and emotional states. Therefore, there is a need for systems that support adaptive communication tailored to participants' language and emotional states. Furthermore, improving the quality of future activities based on participant feedback and emotional data is a key challenge.
[0570] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0571] In this invention, the server includes information storage means for storing multiple interaction activity information that participants can select, emotion recognition means for analyzing the participant's facial expression information to recognize emotions, and activity adjustment means for adaptively controlling activities according to the participant's emotions. This enables smooth and effective interaction activities by providing appropriate support based on the participant's language and emotional state.
[0572] "Information storage means" refers to a device or function that stores and manages information on multiple interaction activities that participants can select.
[0573] A "reservation management system" is a function that accepts and manages reservations for social activities based on the participants' choices.
[0574] "Notification means" refers to a system or function for sending notifications to participants before the start of a scheduled social activity.
[0575] "Communication management means" refers to a function or device for relaying audio and video data between participants.
[0576] "Translation presentation means" refers to a function or device for translating a participant's utterance and presenting the translation result.
[0577] "Evaluation collection methods" refer to functions or systems for collecting evaluations from participants after an exchange activity has concluded.
[0578] "Emotion recognition means" refers to technologies and functions that analyze participants' facial expressions to recognize their emotions.
[0579] "Activity adjustment means" refers to a function or system for adaptively controlling activities in accordance with the emotions of the participants.
[0580] This invention is a system for improving the user experience when participating in social activities, and has multiple functions. The server uses information storage means to store and manage information on social activities that participants can select. When a user selects an activity through a terminal, the reservation management means accepts the reservation based on that selection. Subsequently, notification means appropriately sends a notification to the user before the start of the social activity.
[0581] To ensure smooth communication, the server uses communication management means to relay audio and video data between users in real time. Furthermore, a translation presentation means is used to translate participants' speech in real time using a generation AI model, and the translation results are presented to the users. This enables effective communication even among participants with different language backgrounds.
[0582] To improve user satisfaction, the server analyzes the user's facial expressions using emotion recognition tools and recognizes their emotions. Based on this data, activity adjustment tools adaptively control interaction activities according to the user's emotions, adjusting the pace as needed and providing encouraging messages to ensure a comfortable experience for participants.
[0583] After the activity concludes, the server uses evaluation tools to collect user feedback, and also evaluates emotional data obtained through emotion recognition tools. This data is then used to improve the quality of future interaction activities.
[0584] As a concrete example, in the case of a smart communication robot, this robot is equipped with a small camera to detect the user's smile or confused expression and analyzes their emotions using the Google Cloud Vision API. Based on the analysis results, the robot utilizes the DeepL API to quickly translate the user's speech and supports the real-time transmission of the translated statement to the other party. An example of a prompt in a generative AI model would be a specific instruction such as, "Analyze the user's emotions based on their facial expressions during the conversation. If the expression indicates tension or stress, provide a message of encouragement."
[0585] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0586] Step 1:
[0587] The server stores interaction activity information in a database using information storage means. When a user accesses the activity information via a terminal, the server provides information in response to the request, and the user selects activities of interest. The input at this time is the user's selection information, and the output is information about the activities corresponding to that selection.
[0588] Step 2:
[0589] The server accepts reservations for selected activities using a reservation management system. The input is the user's selection information, which is recorded in the database to complete the reservation. The output is a reservation completion notification sent to the user.
[0590] Step 3:
[0591] The server uses a notification mechanism to inform the user's terminal of the start time of the scheduled activity before the activity begins. The input to this process is the activity start time information, and the output is a notification message to the user.
[0592] Step 4:
[0593] The server uses communication management means to relay audio and video data between users in real time. The input to this process is the user's audio and video data, and the output is to provide this data to other participants as relayed data in real time.
[0594] Step 5:
[0595] The server uses a translation presentation mechanism to translate participants' speech in real time using a generative AI model. The input is user voice data, which is output as translated text through data conversion. The generative AI model provides fast and accurate translation.
[0596] Step 6:
[0597] The server uses emotion recognition technology to analyze the user's facial expressions and recognize their emotions. The input for this process is image data acquired from the camera, which is output as emotion data using the Google Cloud Vision API.
[0598] Step 7:
[0599] The server uses activity adjustment mechanisms to adjust activities based on recognized emotions. Emotional data is the input, and based on this, the server adjusts the pace of activities and provides encouraging messages to the user as output.
[0600] Step 8:
[0601] After the interaction activity concludes, the server uses an evaluation collection mechanism to gather feedback from users. In this process, the input is user evaluation information, which is then aggregated and output as data to help improve the quality of future activities.
[0602] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0603] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0604] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0605] [Fourth Embodiment]
[0606] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0607] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0608] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0609] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0610] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0611] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0612] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0613] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0614] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0615] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0616] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0617] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0618] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0619] The system for implementing this invention provides participants with selectable interaction activities and includes various functions for the smooth operation of those activities. Its specific form is described below.
[0620] First, the server maintains multiple pieces of interaction activity information stored in a database and distributes this activity information in response to requests from terminals. This allows users to view various activities on their terminals and select those that interest them.
[0621] Users make reservations for the social activities they wish to participate in using their devices. The devices send this information to the server, which records it in a database using a reservation management system. Based on this record, the server sends a reminder to the user using a notification system before the activity begins. This ensures that users do not forget to participate in the activity and can take part in events in a timely manner.
[0622] During the event, the devices will provide video and text chat interfaces to facilitate smooth communication between users. A communication management system will relay audio and video data in real time via a server, enabling seamless interaction even among participants in remote locations. When necessary, the server will utilize a generative AI model to translate user speech in real time and display the results on the device through a translation display system. This ensures smooth and misunderstanding-free communication even among users from different language backgrounds.
[0623] Furthermore, after the exchange activity concludes, the server collects evaluations from each participant through an evaluation collection mechanism and stores them in a database. This evaluation data is analyzed by a feedback analysis mechanism and used to improve the quality of future activities. This system provides users with an environment where they can experience activities tailored to their individual needs and effectively practice their foreign language skills.
[0624] As a concrete example, suppose a foreign user learning Japanese participates in a "language exchange event." The server connects this user with a Japanese user via video chat and provides real-time translation between Japanese and the foreign language. After the event, the server uses feedback from both parties to design the next language exchange event more effectively. This is expected to improve participant satisfaction and enhance the quality of learning.
[0625] The following describes the processing flow.
[0626] Step 1:
[0627] The server retrieves information about social activities from the database and provides it to the terminal. Users view this information on their terminal and select activities that interest them.
[0628] Step 2:
[0629] When a user selects an activity they wish to participate in, the device sends a reservation request to the server. The server receives this request and records the reservation information in the database using its reservation management system.
[0630] Step 3:
[0631] As the start time for the interaction activity approaches, the server sends a reminder to the user's device using a notification system. This allows the user to confirm the start of the event in advance.
[0632] Step 4:
[0633] The user checks the notification sent on their device and clicks the link to join the event. The device sends a participation request to the server, and the server uses communication management tools to connect the user to the designated video chat session.
[0634] Step 5:
[0635] The server relays audio and video data between participants in real time. The terminal receives this data and displays it to the user through a video chat interface. If necessary, the server translates the conversation using a generative AI model and sends the result to the terminal for display on the screen.
[0636] Step 6:
[0637] Once the interaction activity concludes, the server displays a feedback form on the terminal. The user enters their evaluation and comments on the activity into this form. The terminal then sends the entered information to the server.
[0638] Step 7:
[0639] The server uses evaluation collection tools to record feedback in a database. Feedback analysis tools are used to analyze the information, and the data will be considered for use in providing better interaction activities.
[0640] (Example 1)
[0641] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0642] In today's increasingly globalized society, overcoming language barriers and facilitating smooth communication among participants are significant challenges. Furthermore, efficiently collecting participant feedback and evaluations to improve the quality of each activity is also essential.
[0643] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0644] In this invention, the server includes an information recording means for storing activity information, a reservation processing means for accepting activity reservations, and a notification means for sending notifications before the start of an activity. This allows participants to smoothly participate in activities regardless of their language, and also enables effective collection and analysis of feedback from participants.
[0645] "Information recording means" refers to a device or configuration for storing and managing multiple activity information items that participants can select.
[0646] "Reservation processing means" refers to a function or process for accepting and recording activity reservations based on participants' choices.
[0647] "Notification means" refers to a method or system for informing participants of information before the start of a scheduled activity.
[0648] "Data management means" refers to interfaces and protocols used to relay and manage communication data between participants.
[0649] "Language conversion means" refers to a system or module for translating participants' utterances and presenting the translation results.
[0650] "Feedback collection methods" refer to techniques or devices for efficiently collecting evaluations and opinions from participants after an activity has concluded.
[0651] A "generative model" refers to an AI-based model or algorithm used to translate participants' speech in real time.
[0652] "Analytical tools" refer to data analysis techniques and methods used to improve the quality of activities based on evaluations from participants.
[0653] This system offers a variety of interactive activities for participants to choose from and includes various functions to ensure their smooth operation. The server has a database for storing activity information and efficiently manages and updates a large amount of activity data through database management software. For example, a database system such as MySQL can be used.
[0654] The server delivers activity information to the user's device via a web server or API in response to the user's request. This allows the user to view and select a variety of activities on their device using a browser or native application (e.g., React Native or Swift).
[0655] When a user wishes to participate in an activity, they make a reservation through their device. The device sends the reservation information to the server, which then records the reservation information in a database. A reservation management system (for example, a system using a RESTful API or RPC) can be used for this process.
[0656] Before an activity begins, the server sends reminder emails or push notifications to users. These notifications utilize email servers (such as Postfix or SendGrid) or push notification services (such as Firebase Cloud Messaging).
[0657] During the event, the devices will provide video and text chat interfaces, enabling smooth communication between users. The server will use protocols such as RTC for communication management to relay audio and video in real time. In particular, for participants who speak different languages, generative AI models (e.g., Google Translate API or Microsoft Azure Translator) will be used to translate their speech in real time, and the translation results will be displayed on the device.
[0658] As a concrete example, consider a foreign user learning Japanese who participates in a "language exchange event." The server connects this user with a Japanese user via video chat and provides real-time translation between Japanese and the foreign language. An example of a prompt might be, "Please explain the process for providing real-time translation between Japanese and English." After the event, feedback is collected from both parties and used to design the next language exchange event more effectively. This allows participants to have a smooth experience of cross-language communication.
[0659] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0660] Step 1:
[0661] The server retrieves activity information from the database using information recording means and distributes it based on requests from the terminal. The input is the user's request data, and the output is the activity information sent to the terminal. Specifically, the server queries the MySQL database to retrieve the necessary information and transmits it to the terminal via a RESTful API.
[0662] Step 2:
[0663] The user views the activity information received on their device and selects the activities they wish to participate in. The input is the displayed activity information, and the output is the activity data selected by the user. This selection data is processed via a user interface using client-side scripting such as JavaScript.
[0664] Step 3:
[0665] The terminal uses a reservation processing mechanism to send reservation information for the selected activity to the server. The input is the user's selected activity data, and the output is the reservation information recorded on the server. Specifically, the terminal sends the user selection information to the server in JSON format, and the server records it directly in the database.
[0666] Step 4:
[0667] The server uses notification methods based on reservation information to send reminders to users before the activity begins. The input is reservation information, and the output is a reminder email or push notification sent to the user. Specifically, the server sends emails using the SMTP protocol or push notifications via the Firebase Cloud Messaging API.
[0668] Step 5:
[0669] The terminal uses communication management means to provide video chat and text chat interfaces, enabling communication between users. Input is user voice and text data, and output is real-time video and audio data provided to other users. Using the RTC protocol, the server relays the audio and video data.
[0670] Step 6:
[0671] The server applies a generative AI model as a language conversion method, translating participants' speech in real time and displaying it on the terminal. The input is the user's speech data, and the output is translated text data. This includes the specific actions the server takes when performing translation via the Google Translate API.
[0672] Step 7:
[0673] The server collects feedback from participants after the activity using a feedback collection system, and uses this information to improve the quality of future activities. The input is participant evaluation data, and the output is analyzed feedback data. Specifically, evaluations are collected through a survey form and analyzed using an analysis program.
[0674] (Application Example 1)
[0675] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0676] In online exchange activities, language barriers exist when participants use different languages, making smooth real-time communication difficult. Furthermore, there is a need for effective methods to collect and analyze feedback to improve the quality of the activities.
[0677] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0678] In this invention, the server includes an information storage means for storing multiple exchange activity information that participants can select, a communication management means for relaying audio and video data between participants, and a translation presentation means for translating participants' speech and presenting the translation results in real time. This enables smooth, real-time communication even among participants who speak different languages. Furthermore, evaluations from participants can be collected after the exchange activity ends, and the quality of the activity can be improved based on the feedback.
[0679] "Information storage means" refers to a memory device that holds information on interaction activities that participants can select and provides as needed.
[0680] A "reservation management system" is a means that has the function of accepting and managing reservations for exchange activities based on the participants' choices.
[0681] "Notification method" refers to a means of sending reminders or notifications to participants in advance before the start of a scheduled social activity.
[0682] A "communication management system" is a means that has management functions to relay audio and video data between participants in real time and to enable smooth communication.
[0683] A "translation presentation method" is a means of translating participants' utterances and presenting the translation results quickly and accurately.
[0684] "Evaluation collection methods" refer to methods for gathering feedback from participants after an exchange activity has concluded, in order to collect information that can be used to improve future activities.
[0685] "Improvement measures" refer to methods for taking steps to improve the quality of interaction activities based on feedback received from participants.
[0686] A "generative model" is a machine learning model used to translate participants' speech in real time.
[0687] To implement this invention, a system is constructed in which a server and a user terminal work together to perform their functions. The server first stores information on multiple interaction activities in a database using an information storage means. This information is used when participants access it on their terminals and select activities of interest. The selected activities are recorded through a reservation management means based on requests sent from the terminals, and the server manages the reservation information.
[0688] Before a scheduled activity begins, the server sends a reminder to participants using a notification system. This ensures that participants do not miss the timing of the activity.
[0689] During the activity, the user's device collects audio and video data and relays it in real time with other participants via a communication management system. For video chat, WebRTC technology is used to ensure reliable audio and video delivery. For translation presentation, the Google Translate API is utilized, and a machine learning model (generative AI model) is used to translate the user's speech in real time. The resulting translation is immediately displayed on the participant's device, enabling smooth communication even among participants using different languages.
[0690] After the activity concludes, the server uses evaluation tools to collect feedback from each participant. Then, using improvement tools, this feedback is analyzed and incorporated into future activities to enhance their quality.
[0691] As a concrete example, let's say a foreign user learning Japanese participates in an online workshop about Japanese culture. In this case, the server connects him to a Japanese-speaking instructor via video chat and provides real-time translation between Japanese and his native language. After the event ends, the user can submit feedback on topics they would like to see covered in the next workshop. For example, they could submit a prompt such as, "Please tell me what topics you would like to see covered in the next Japanese culture workshop. Examples: Tea ceremony, how to make bonsai."
[0692] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0693] Step 1:
[0694] The user accesses the server using a terminal and retrieves interaction activity information stored in the information storage device. The server receives the user's request as input, filters the corresponding activity information from the database, and presents it to the terminal as output. This allows the user to select activities of interest.
[0695] Step 2:
[0696] The user sends a reservation request from their device for the selected activity. The server, receiving the selection information as input, uses a reservation management system to combine the activity information and user information and record the reservation in the database. The user is then notified that the reservation is complete.
[0697] Step 3:
[0698] The server uses a notification system before the scheduled activity begins, sending a reminder to the user's device using the reservation information as input. The user receives a notification containing activity details and time as output, ensuring they don't forget to participate.
[0699] Step 4:
[0700] During the activity, the user's device collects participants' audio and video in real time. It receives data from the device's microphone and camera as input, and relays it to other participants via a server using a communication management system. As output, the audio and video data are sent to other participants on the video chat platform.
[0701] Step 5:
[0702] The translation presentation system allows the server to translate user speech in real time. It converts user voice data into text as input and sends it to a generation AI model. A translation API is used for data calculation, and the output—text translated into different languages—is presented to the user's terminal. This enables smooth communication among participants, even in different languages.
[0703] Step 6:
[0704] After the interaction activity ends, users send feedback to the server via their devices. The server receives evaluation data as input and records it in a database using an evaluation collection method. As output, feedback data is collected to help improve future activities.
[0705] Step 7:
[0706] The server analyzes the collected feedback using improvement methods. Using evaluation data as input, it extracts areas for improvement to enhance the quality of the activity through data analysis techniques. The output is a plan for improvement in the next activity. This leads to improved activity quality and increased participant satisfaction.
[0707] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0708] Embodiments of this invention consist of a system having various functions to enhance user-participatory social activities. First, the server provides social activity information stored in a database to the user's terminal, and the user can view this information via the terminal and select activities of interest. The reservation management means allows the server to accept reservations for the selected activities and record them in the database.
[0709] Furthermore, by incorporating an emotion engine, it becomes possible to recognize the user's emotions from their speech and captured facial expressions. The server analyzes the output of the emotion engine and has control mechanisms to adjust the progress of the interaction activity as needed. For example, if a participant is detected as being nervous, the system can adjust the pace of the activity or send encouraging messages.
[0710] When users participate in interaction activities, the server uses communication management means to relay audio and video data in real time, and further provides real-time translations as needed using translation presentation means, thereby facilitating smooth communication even among users with different language backgrounds. A generative model is used for translation, providing fast and accurate translation results.
[0711] After the interaction activity concludes, the server displays a feedback form on the terminal to collect user feedback. Furthermore, emotional data collected by the emotion engine is also evaluated, and this data is analyzed using data analysis tools. The resulting data is used to improve the quality of future interaction activities, thereby enhancing the user's learning effectiveness.
[0712] For example, if a Japanese user learning English and an English-speaking user learning Japanese participate in a language exchange activity, the server connects the users via video chat, and the emotion engine recognizes emotions from the participants' facial expressions. If the server determines that the users are enjoying themselves, it maintains the activity to ensure that situation continues. On the other hand, if the server determines that either user is confused, it adaptively controls the interaction, such as by enhancing the translation presentation to support conversation comprehension. This makes it possible to maintain motivation for language learning and maximize practical learning effectiveness.
[0713] The following describes the processing flow.
[0714] Step 1:
[0715] The server retrieves interaction activity information from the database and sends it to the user's terminal. The user then browses the activity information provided on their terminal and selects activities that interest them.
[0716] Step 2:
[0717] When a user selects an activity they wish to participate in, the device sends a reservation request to the server. The server uses a reservation management system to record this information in a database and sends a reservation confirmation notification to the device.
[0718] Step 3:
[0719] Before the interaction activity begins, the server uses a notification system to send a reminder to the user's device informing them of the start time. The user receives the notification and prepares for the activity.
[0720] Step 4:
[0721] After the user confirms the notification and indicates their intention to participate in the activity via their device, the device sends a participation request to the server. The server connects the user to the video chat session via a communication management system.
[0722] Step 5:
[0723] During a video chat, the server uses a generative model to translate speech in real time and displays the results on the terminal via a translation display device. The server also operates an emotion engine to analyze the user's emotions from their speech and facial expressions. Based on this emotion data, the server adjusts the flow of the interaction as needed.
[0724] Step 6:
[0725] Once the interaction activity is complete, the server instructs the device to display a feedback form. The user enters their evaluation and comments on the activity into the form, and the device sends it to the server.
[0726] Step 7:
[0727] The server uses an evaluation collection mechanism to store user feedback in a database, and then uses a feedback analysis mechanism to analyze all the data, including the results of the emotion engine's analysis. Based on the analysis results, it plans future activities and system improvements.
[0728] (Example 2)
[0729] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0730] Traditional interaction activity systems often suffered from difficulties in smooth communication among participants due to differences in language and emotions, resulting in reduced effectiveness. Furthermore, the lack of mechanisms to dynamically adjust activity content made it challenging to achieve sufficient participant satisfaction. Additionally, post-activity evaluations were often managed merely as numerical data, failing to effectively utilize feedback for future improvements.
[0731] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0732] In this invention, the server includes data storage means for storing information on multiple interaction activities that participants can select, reservation acceptance means for accepting reservations for interaction activities based on the participant's selection, and emotion analysis and adjustment means for analyzing the emotions of participants during interaction activities and dynamically adjusting the content of the activities. This makes it possible to provide flexible activities that respond to language differences and changes in emotions.
[0733] A "data storage means" is a medium or device that stores information on multiple interaction activities that participants can select, and allows them to access it as needed.
[0734] A "reservation acceptance method" refers to a medium or function for accepting reservations for exchange activities based on participants' choices.
[0735] "Information transmission means" refers to functions or devices used to send notifications to participants before the start of a scheduled exchange activity.
[0736] "Data transmission means" refers to technologies and devices for relaying audio and video data between participants.
[0737] "Translation provision means" refers to a function or mechanism for translating participants' utterances and presenting the translation results.
[0738] "Emotional analysis and adjustment means" refers to methods or devices for analyzing the emotions of participants during interaction activities and dynamically adjusting the content of those activities based on the results.
[0739] "Data collection and analysis means" refers to a system or process for collecting evaluations from participants after the completion of an interaction activity and analyzing them together with the participants' emotional data.
[0740] A "generative model" is a machine learning model or algorithm used to translate participants' speech in real time.
[0741] As an embodiment of this invention, a system for more effectively facilitating interaction activities is provided. The server primarily utilizes data storage means, reservation acceptance means, sentiment analysis and adjustment means, data transmission means, translation provision means, and data collection and analysis means to facilitate smooth communication among participants. Specifically, this system operates as follows.
[0742] The server first uses data storage means to store information related to the interaction activities in a database. The hardware used here includes cloud storage systems and relational database systems. This information is stored in a way that makes it accessible to participants.
[0743] When a participant selects an activity, the device sends this information to the server using a reservation acceptance mechanism. The device provides a user interface via a web browser or mobile application. The reserved information is recorded in the server's database.
[0744] Once the interaction begins, the server uses data transmission methods to relay participants' audio and video data in real time. The software used for this can leverage the APIs of common video conferencing tools.
[0745] During the activity, the server utilizes emotion analysis and adjustment tools, using a generative AI model to analyze participants' speech and facial expression data. The software used includes machine learning algorithms that read emotions. This allows the system to determine whether participants are relaxed or tense, and adjust the pace and content of the activity accordingly.
[0746] The server also uses translation tools to translate participants' different languages in real time. The translation service, provided by a generative AI model, is highly accurate and fast.
[0747] Once the interaction activity concludes, the device collects and analyzes feedback from participants via data collection and analysis tools and sends it to the server. This feedback, along with sentiment analysis data, is analyzed in a way that helps improve future interaction activities.
[0748] As a concrete example, a Japanese user might participate with the goal of learning English, while an English-speaking user might participate in the exchange activities to learn Japanese. The server connects each user via video chat, and the emotion engine analyzes the facial expressions of both. In this case, an example of a prompt sentence to be input into the generating AI model would be, "Please suggest a way to adjust the pace of the activity based on the emotion data obtained from User A's facial expressions."
[0749] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0750] Step 1:
[0751] The server retrieves interaction activity information from the database using data storage means and sends it to the terminal. This information includes the type of activity, date and time, and location. When a user logs in and accesses the system, the terminal provides a user interface that displays this information. The user selects activities of interest as user input, and the terminal sends this selection to the server. The output is the user's selection information, which is stored on the server.
[0752] Step 2:
[0753] The terminal sends a reservation request for an activity to the server based on the user's selection. The server records the reservation information in a database using a reservation acceptance mechanism. The input is the reservation request, which includes the user ID and the selected activity information. The server output is the reservation status recorded in the database. This allows the user to confirm whether their reservation was successful.
[0754] Step 3:
[0755] The server utilizes a video conferencing system to relay audio and video data between participants in real time using data transmission means. When a user joins an activity, the terminal sends and receives this media data, providing the user with live video and audio. The input is audio and video data from the user, and the output is audio and video data transmitted to other participants.
[0756] Step 4:
[0757] The server utilizes emotion analysis and adjustment mechanisms, using a generative AI model to analyze participants' statements and captured facial expressions. This allows for real-time understanding of the user's emotional state. The input is participant's statements and facial expression data, and the output is analyzed emotional information. Specifically, if the server determines that a participant is tense, it may slow down the pace of the activity or send reassuring messages.
[0758] Step 5:
[0759] The server utilizes a translation service and uses a generative AI model to translate participants' utterances in real time. The input is participant utterance data, which may include different languages. The output is the translated text. The terminal displays this text in a user interface to help participants understand the meaning.
[0760] Step 6:
[0761] After the interaction activity concludes, the terminal uses data collection and analysis tools to obtain feedback from participants and transmit it to the server. The input consists of user evaluations and impressions. The server collects this data and analyzes it, along with sentiment data, to identify areas for improvement for future activities. The output is improvement suggestions based on the analysis, thereby enhancing the system's service.
[0762] (Application Example 2)
[0763] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0764] In online or offline interaction activities, participants may face difficulties in smooth communication due to differing language backgrounds and emotional states. Therefore, there is a need for systems that support adaptive communication tailored to participants' language and emotional states. Furthermore, improving the quality of future activities based on participant feedback and emotional data is a key challenge.
[0765] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0766] In this invention, the server includes information storage means for storing multiple interaction activity information that participants can select, emotion recognition means for analyzing the participant's facial expression information to recognize emotions, and activity adjustment means for adaptively controlling activities according to the participant's emotions. This enables smooth and effective interaction activities by providing appropriate support based on the participant's language and emotional state.
[0767] "Information storage means" refers to a device or function that stores and manages information on multiple interaction activities that participants can select.
[0768] A "reservation management system" is a function that accepts and manages reservations for social activities based on the participants' choices.
[0769] "Notification means" refers to a system or function for sending notifications to participants before the start of a scheduled social activity.
[0770] "Communication management means" refers to a function or device for relaying audio and video data between participants.
[0771] "Translation presentation means" refers to a function or device for translating a participant's utterance and presenting the translation result.
[0772] "Evaluation collection methods" refer to functions or systems for collecting evaluations from participants after an exchange activity has concluded.
[0773] "Emotion recognition means" refers to technologies and functions that analyze participants' facial expressions to recognize their emotions.
[0774] "Activity adjustment means" refers to a function or system for adaptively controlling activities in accordance with the emotions of the participants.
[0775] This invention is a system for improving the user experience when participating in social activities, and has multiple functions. The server uses information storage means to store and manage information on social activities that participants can select. When a user selects an activity through a terminal, the reservation management means accepts the reservation based on that selection. Subsequently, notification means appropriately sends a notification to the user before the start of the social activity.
[0776] To ensure smooth communication, the server uses communication management means to relay audio and video data between users in real time. Furthermore, a translation presentation means is used to translate participants' speech in real time using a generation AI model, and the translation results are presented to the users. This enables effective communication even among participants with different language backgrounds.
[0777] To improve user satisfaction, the server analyzes the user's facial expressions using emotion recognition tools and recognizes their emotions. Based on this data, activity adjustment tools adaptively control interaction activities according to the user's emotions, adjusting the pace as needed and providing encouraging messages to ensure a comfortable experience for participants.
[0778] After the activity concludes, the server uses evaluation tools to collect user feedback, and also evaluates emotional data obtained through emotion recognition tools. This data is then used to improve the quality of future interaction activities.
[0779] As a concrete example, in the case of a smart communication robot, this robot is equipped with a small camera to detect the user's smile or confused expression and analyzes their emotions using the Google Cloud Vision API. Based on the analysis results, the robot utilizes the DeepL API to quickly translate the user's speech and supports the real-time transmission of the translated statement to the other party. An example of a prompt in a generative AI model would be a specific instruction such as, "Analyze the user's emotions based on their facial expressions during the conversation. If the expression indicates tension or stress, provide a message of encouragement."
[0780] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0781] Step 1:
[0782] The server stores interaction activity information in a database using information storage means. When a user accesses the activity information via a terminal, the server provides information in response to the request, and the user selects activities of interest. The input at this time is the user's selection information, and the output is information about the activities corresponding to that selection.
[0783] Step 2:
[0784] The server accepts reservations for selected activities using a reservation management system. The input is the user's selection information, which is recorded in the database to complete the reservation. The output is a reservation completion notification sent to the user.
[0785] Step 3:
[0786] The server uses a notification mechanism to inform the user's terminal of the start time of the scheduled activity before the activity begins. The input to this process is the activity start time information, and the output is a notification message to the user.
[0787] Step 4:
[0788] The server uses communication management means to relay audio and video data between users in real time. The input to this process is the user's audio and video data, and the output is to provide this data to other participants as relayed data in real time.
[0789] Step 5:
[0790] The server uses a translation presentation mechanism to translate participants' speech in real time using a generative AI model. The input is user voice data, which is output as translated text through data conversion. The generative AI model provides fast and accurate translation.
[0791] Step 6:
[0792] The server uses emotion recognition technology to analyze the user's facial expressions and recognize their emotions. The input for this process is image data acquired from the camera, which is output as emotion data using the Google Cloud Vision API.
[0793] Step 7:
[0794] The server uses activity adjustment mechanisms to adjust activities based on recognized emotions. Emotional data is the input, and based on this, the server adjusts the pace of activities and provides encouraging messages to the user as output.
[0795] Step 8:
[0796] After the interaction activity concludes, the server uses an evaluation collection mechanism to gather feedback from users. In this process, the input is user evaluation information, which is then aggregated and output as data to help improve the quality of future activities.
[0797] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0798] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0799] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0800] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0801] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0802] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0803] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0804] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0805] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0806] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0807] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0808] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0809] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0810] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0811] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0812] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0813] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0814] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0815] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0816] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0817] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0818] The following is further disclosed regarding the embodiments described above.
[0819] (Claim 1)
[0820] Information storage means for storing information on multiple interaction activities that participants can select,
[0821] A reservation management system that accepts reservations for interaction activities based on the participants' choices,
[0822] A notification method for sending a notification to participants before the start of a scheduled social activity,
[0823] A communication management means for relaying audio and video data between participants,
[0824] A translation presentation device that translates participants' utterances and presents the translation results,
[0825] A means of collecting evaluations from participants after the exchange activity has ended,
[0826] A system that includes this.
[0827] (Claim 2)
[0828] The system according to claim 1, comprising a translation presentation means that utilizes a generative model for translating the content of participants' speech in real time.
[0829] (Claim 3)
[0830] The system according to claim 1, comprising a feedback analysis means for improving the quality of interaction activities based on evaluations from participants.
[0831] "Example 1"
[0832] (Claim 1)
[0833] Information recording means for storing multiple activity information that participants can select,
[0834] A reservation processing method that accepts activity reservations based on participant selections,
[0835] A notification method that sends a notification to participants before the start of a scheduled activity,
[0836] A data management system that relays communication data between participants,
[0837] A language conversion means that translates the participants' utterances and presents the translation results,
[0838] A method for collecting feedback from participants after the activity is completed,
[0839] A system that includes this.
[0840] (Claim 2)
[0841] The system according to claim 1, comprising a language conversion means that utilizes a generative model for translating the content of participants' speech in real time.
[0842] (Claim 3)
[0843] The system according to claim 1, comprising analytical means for improving the quality of activities based on evaluations from participants.
[0844] "Application Example 1"
[0845] (Claim 1)
[0846] Information storage means for storing information on multiple interaction activities that participants can select,
[0847] A reservation management system that accepts reservations for interaction activities based on the participants' choices,
[0848] A notification method for sending a notification to participants before the start of a scheduled social activity,
[0849] A communication management means for relaying audio and video data between participants,
[0850] A translation presentation device that translates participants' utterances and presents the translation results,
[0851] A means of collecting evaluations from participants after the exchange activity has ended,
[0852] Based on feedback received from participants, we will develop methods to improve the quality of interaction activities,
[0853] A means that has the function of being able to present translation results in real time through the provision of online exchange activities,
[0854] A system that includes this.
[0855] (Claim 2)
[0856] The system according to claim 1, comprising a translation presentation means that utilizes a generative model to translate participants' utterances in real time, thereby facilitating smooth communication between multiple languages.
[0857] (Claim 3)
[0858] The system according to claim 1, comprising a feedback analysis means for analyzing evaluations and feedback from participants and using them as material to improve the quality of future interaction activities.
[0859] "Example 2 of combining an emotion engine"
[0860] (Claim 1)
[0861] A data storage means for storing information on multiple interaction activities that participants can select,
[0862] A reservation acceptance method that accepts reservations for interaction activities based on the participant's selection,
[0863] A means of sending information to participants before the start of a scheduled social activity,
[0864] A data transmission means for relaying audio and video data between participants,
[0865] A translation service that translates participants' utterances and presents the translation results,
[0866] An emotion analysis and adjustment mechanism that analyzes the emotions of participants during interaction activities and dynamically adjusts the content of those activities,
[0867] A data collection and analysis method that collects evaluations from participants after the exchange activity has ended and analyzes them together with the participants' emotional data,
[0868] A system that includes this.
[0869] (Claim 2)
[0870] The system according to claim 1, comprising a translation provision means that utilizes a generative model for translating the content of participants' speech in real time.
[0871] (Claim 3)
[0872] The system according to claim 1, comprising analytical means for improving the quality of interaction activities based on evaluations and emotional data from participants.
[0873] "Application example 2 when combining with an emotional engine"
[0874] (Claim 1)
[0875] Information storage means for storing information on multiple interaction activities that participants can select,
[0876] A reservation management system that accepts reservations for interaction activities based on the participants' choices,
[0877] A notification method for sending a notification to participants before the start of a scheduled social activity,
[0878] A communication management means for relaying audio and video data between participants,
[0879] A translation presentation device that translates participants' utterances and presents the translation results,
[0880] A means of collecting evaluations from participants after the exchange activity has ended,
[0881] An emotion recognition method that analyzes participants' facial expressions to recognize their emotions,
[0882] Activity adjustment means that adaptively control activities according to the emotions of participants,
[0883] A system that includes this.
[0884] (Claim 2)
[0885] The system according to claim 1, comprising a translation presentation means that utilizes a generative model to translate participants' utterances in real time, and which analyzes participants' utterances and facial expressions to adjust interaction activities.
[0886] (Claim 3)
[0887] The system according to claim 1, comprising a feedback analysis means for improving the quality of interaction activities based on evaluations and emotional information from participants. [Explanation of Symbols]
[0888] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. Information storage means for storing information on multiple interaction activities that participants can select, A reservation management system that accepts reservations for interaction activities based on the participants' choices, A notification method for sending a notification to participants before the start of a scheduled social activity, A communication management means for relaying audio and video data between participants, A translation presentation device that translates participants' utterances and presents the translation results, A means of collecting evaluations from participants after the exchange activity has ended, Based on feedback received from participants, we will develop methods to improve the quality of interaction activities, A means that has the function of being able to present translation results in real time through the provision of online exchange activities, A system that includes this.
2. The system according to claim 1, comprising a translation presentation means that utilizes a generative model to translate participants' utterances in real time, thereby facilitating smooth communication between multiple languages.
3. The system according to claim 1, comprising a feedback analysis means for analyzing evaluations and feedback from participants and using them as material to improve the quality of future interaction activities.