system
The system addresses schedule and communication challenges by collecting and analyzing user data to prioritize tasks and translate languages, enhancing efficiency and reducing overload.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
Users face challenges in managing busy schedules, prioritizing tasks, and overcoming communication barriers due to information overload and language differences, which conventional systems fail to address effectively.
A system that collects user communication and voice data, converts it into text, analyzes the text for important information, prioritizes tasks and schedules, and provides optimized information while supporting multilingual communication through automatic translation.
Enables efficient schedule and task management, facilitates smooth communication across languages, and reduces the burden of information overload by providing personalized and timely information.
Smart Images

Figure 2026101211000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of the chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In modern society, users are faced with multiple problems such as managing a busy schedule and prioritizing tasks, the difficulty of selection due to an overabundance of information, and communication barriers in foreign languages. There is a need for a method to efficiently solve these problems.
Means for Solving the Problems
[0005] The present invention acquires a user's communication data and voice data, converts the voice data into text data, enabling automatic translation between multiple languages. It also analyzes the converted text data, extracts important information, and automatically determines the priorities of tasks and schedules. By providing a system that provides optimal information to the user based on the analyzed information, the above problems are solved.
[0006] "Data acquisition means" refers to an element that has the function of collecting the user's communication data and voice data.
[0007] "Speech conversion means" refers to an element that has the function of converting collected speech data into text information.
[0008] An "information analysis tool" is an element that has the function of analyzing textual information and extracting important keywords and information.
[0009] A "prioritization mechanism" is an element that has the function of determining the importance of tasks and schedules and deciding their order based on the analyzed information.
[0010] An "information provision means" is an element that has the function of informing users of the most relevant information based on the analyzed data.
[0011] A "translation tool" is an element that has the function of automatically converting languages between multiple languages. [Brief explanation of the drawing]
[0012] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7]It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.
Modes for Carrying Out the Invention
[0013] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0014] First, the language used in the following description will be explained.
[0015] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0016] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0017] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.
[0018] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.
[0019] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0020] [First Embodiment]
[0021] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0022] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0023] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0024] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0025] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0026] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0027] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0028] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0029] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0030] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0031] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0032] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0033] This invention is an AI system designed to streamline schedule management, task management, information retrieval, and multilingual communication in daily life. Specifically, it operates through the collaboration of a server, a terminal, and a user. First, the server, with the user's permission, collects the user's email and voice data. This allows the server to obtain necessary information from the vast amount of daily communication data.
[0034] Next, the server uses speech recognition technology to convert the audio data into text. This converted text is then analyzed by the server's information analysis tools to extract important keywords, scheduled dates, tasks, and other relevant information. This clarifies the necessary information and provides the foundational data for schedule and task management.
[0035] Furthermore, the server uses a prioritization mechanism to automatically determine the priority of tasks and schedules based on the acquired information. This decision-making process takes into account the user's past behavior history and current situation, resulting in the creation of individually optimized plans.
[0036] Subsequently, the server provides users with the most relevant information through its information delivery system. For example, notifications regarding important meeting schedules or tasks requiring prior preparation are displayed on the user's device as needed. This allows users to take immediate action as required.
[0037] Furthermore, the server uses translation tools to automatically translate text and audio data to support smooth communication between languages. The terminal receives the translated text and audio data and presents it to the user. For example, it can be used to provide real-time translation during meetings conducted in foreign languages, allowing users to understand the content in their own language.
[0038] The introduction of this system will enable users to efficiently manage their schedules and tasks, and to obtain necessary information without being overwhelmed by information overload. It will also facilitate communication that transcends language barriers.
[0039] The following describes the processing flow.
[0040] Step 1:
[0041] The server, with the user's permission, accesses the user's mail server to collect the latest email data. It also retrieves meeting audio data from cloud storage.
[0042] Step 2:
[0043] The server applies speech recognition technology to the collected audio data and converts it into accurate text data. During this process, noise is filtered to improve conversion accuracy.
[0044] Step 3:
[0045] The server analyzes the converted text data using a natural language processing engine to extract important keywords, deadlines, and priority tasks. This clarifies the key points of the information.
[0046] Step 4:
[0047] Based on the extracted information, the server prioritizes tasks and schedules, taking into account the user's past behavior patterns and current plans.
[0048] Step 5:
[0049] The server searches for external information related to the user and retrieves the most relevant information regarding schedules and tasks. This information is intended to support the user's actions.
[0050] Step 6:
[0051] The server translates conversations and text data into a language the user can understand to facilitate multilingual communication. This is done using natural language processing technology.
[0052] Step 7:
[0053] The device receives notifications from the server and informs the user of high-priority tasks and schedules. For example, it prompts the user to take action by sending notifications about urgent tasks.
[0054] Step 8:
[0055] The device displays the translation results received from the server and plays them back as audio if necessary. This allows users to communicate more smoothly in foreign languages.
[0056] (Example 1)
[0057] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0058] In modern society, users are surrounded by vast amounts of communication and audio data, facing information overload. Furthermore, language barriers exist in communication with people who speak different languages. These factors complicate schedule and task management in daily life and work, hindering efficient action. This invention aims to solve these problems and enable users to appropriately manage and acquire the information they need.
[0059] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0060] In this invention, the server includes data acquisition means for acquiring user communication data and voice data, voice conversion means for converting acquired voice data into text data, information analysis means for analyzing text data and extracting important information, priority determination means for determining the priority of tasks and schedules based on the analyzed information and considering the user's past and present behavioral history, information provision means for providing optimal information to the user through notifications, and translation means for performing automatic text conversion between multiple languages. As a result, users can efficiently manage their schedules and tasks, and freely acquire and use information across language barriers.
[0061] "Users" refers to individuals or groups who use the system for information management and communication.
[0062] "Communication data" refers to digital data used by users to exchange information with others, such as emails, messages, and voice communications.
[0063] "Audio data" refers to digital audio data that includes information recorded by the user in voice.
[0064] "Data acquisition means" refers to the function by which the system collects communication data and voice data from users.
[0065] "Voice conversion means" refers to technology that converts acquired voice data into text format.
[0066] "Text data" refers to information in text format that has been converted by a speech-to-text conversion method.
[0067] "Information analysis means" refers to a function that analyzes text data and extracts important information and keywords from it.
[0068] "Priority determination method" refers to a technique that sets priorities based on analyzed information, taking into account the importance and urgency of tasks and appointments.
[0069] "Information provision means" refers to a function that notifies users of necessary information at the appropriate time.
[0070] "Translation means" refers to a function that performs automatic text or audio translation between multiple languages.
[0071] An "information processing system" refers to the overall mechanism that combines these means to support users' information management and communication.
[0072] This invention uses an information processing system to streamline user schedule management, task management, and multilingual communication. At the core of the system is a server, which collects communication and voice data with the user's permission. The server has the capability to retrieve data from the user's email service and voice assistant service via an internet connection.
[0073] The collected audio data is converted into text format by the server using "speech recognition technology." Specifically, a "speech recognition API" is used in this process, and the information in the audio file is stored on the server as text data.
[0074] The text data is then analyzed using "natural language processing technology." The server uses a "text analysis library" to automatically extract important information, keywords, due dates, and task details. Based on this information, the server determines the priority of tasks and appointments using a "priority determination mechanism." This uses an algorithm that takes into account past data and current usage.
[0075] The server then provides information to the user's device through a "notification system," ensuring that important schedules and tasks are immediately recognized by the user. The device displays the received notifications on its screen, helping the user take relevant actions quickly.
[0076] Furthermore, the system utilizes "automatic translation technology" to support multilingual communication. The server uses a "translation API" to convert text and voice messages into the required language, and the terminal presents it to the user as audio or text display.
[0077] For example, if a user enters a prompt such as, "Add a lunch date with a friend this Sunday," the system will register the date in the schedule and notify them. It also has a function to translate and present meeting agendas in response to prompts such as, "Summarize the contents of next Wednesday's meeting in English."
[0078] Thus, this system enables users to effectively manage their schedules and tasks, and to smoothly obtain information and carry out activities even in different language environments.
[0079] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0080] Step 1:
[0081] The server acquires user communication data and voice data. Inputs include emails and voice logs with the user's permission. The server uses a "data acquisition method" to collect this data internally. Outputs include the formation of databases of communication data and voice data. Specific operations include the server accessing email via an "API" to retrieve new emails and attachments.
[0082] Step 2:
[0083] The server converts acquired audio data into text data. Audio data is fed to the server's speech recognition system as input. The server uses its "speech recognition engine" to perform data processing that converts the audio into text information. The output is the audio data converted into text format. Specific operations include analyzing the audio sample and identifying phonemes, resulting in the generation of text data.
[0084] Step 3:
[0085] The server analyzes text data and extracts important information. As input, data converted to text format is fed into the information analysis system. The server uses a "natural language processing library" to analyze the text elements and extract important keywords, dates, and task names. The output is provided as a list of these extracted elements. Specific operations include noun phrase analysis and relationship identification processes.
[0086] Step 4:
[0087] The server determines the priority of tasks and appointments based on the extracted information. Key keywords and schedule information are sent to the prioritization algorithm as input. The server uses a "decision engine" to analyze the data and automatically set priorities. The output is a list of tasks and appointments organized according to their priority. Specific actions include assigning the most appropriate priority by considering historical data and current importance.
[0088] Step 5:
[0089] The server generates notification information to provide users with the most relevant information and sends it to their devices. Inputs include a prioritized task list and schedule information. The server uses a "notification generation system" to send important tasks and appointments to the user's device. Output is a schedule notification displayed on the user's screen. Specifically, this is displayed as a pop-up notification on the device, making it easy for the user to check.
[0090] Step 6:
[0091] The server performs automatic translation between multiple languages as needed. Text data requiring translation is sent to the server as input. The server uses a "translation engine" to process the text and convert it into another language. The output is the translated text data. In practice, meeting notes are translated in real time and displayed on the terminal.
[0092] (Application Example 1)
[0093] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0094] In modern society, many people face a massive amount of information and multinational communication. This makes effective time management and smooth communication across multiple languages difficult. Furthermore, there is a need for efficient schedule management using voice instructions and information-based two-way translation. Conventional systems have been unable to adequately support these needs, leaving challenges in improving individual time efficiency and promoting international understanding.
[0095] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0096] In this invention, the server includes a processing device for acquiring user data, a speech conversion device for converting acquired speech data into text information, an information analysis device for analyzing text information and extracting important information, a priority determination device for determining the priority of schedules and work instructions, an information presentation device for providing optimal information to the user, a translation device for performing automatic translation between multiple languages, and a two-way communication device for managing schedules based on speech instructions and translating speech data into other languages. This enables users to efficiently and effectively manage their schedules and communicate across multiple languages.
[0097] "Processing device for acquiring user data" refers to hardware or software for collecting and analyzing voice and communication data with the user's permission.
[0098] A "voice conversion means" is a technical device that has the function of accurately converting acquired voice data into text information.
[0099] "Information analysis device means" refers to a system or program for analyzing converted character information and extracting important information.
[0100] A "priority determination device" is a device or mechanism that determines the order in which schedules and tasks are to be performed based on analyzed information.
[0101] "Information presentation device means" refers to equipment or modules for displaying or notifying users of optimized information.
[0102] A "translation device means" is a system that provides the function of performing automatic translation between different languages.
[0103] A "two-way communication device means" is a device that manages schedules based on voice instructions and transmits and receives information translated into other languages in real time.
[0104] The system implementing this invention operates efficiently primarily through cooperation between a server, a terminal, and a user. The server is responsible for collecting communication data and voice data based on the user's permission. Portable devices such as smart glasses and smartphones are used for data collection, and applications running on these devices cooperate with the server to acquire the necessary data.
[0105] The server first uses a software library called speech_recognition to convert speech data into text. This converted text is then analyzed using libraries such as googletrans to extract important information. This analysis clarifies the necessary information and prepares the basic data used for managing schedules and tasks.
[0106] The server then automatically determines the priority of appointments and tasks based on the analyzed information. This prioritization process takes into account the user's past behavior history and current situation, resulting in a schedule optimized for each individual user.
[0107] The terminal displays information provided by the server to the user and notifies them of important appointments and tasks. This allows the user to take immediate action. In addition, a translation function provides real-time translation between different languages, helping users understand information in their native language, even in meetings conducted in a foreign language.
[0108] For example, if a user says, "There is a meeting tomorrow at 10 AM," the system recognizes this as voice input and immediately updates the schedule. Furthermore, this information can be translated into a foreign language and communicated to foreign colleagues as needed. This system accurately processes user voice instructions, manages schedules via voice, and translates the content into other languages such as English.
[0109] An example of a prompt is: "Manage my schedule by voice and translate the content into English."
[0110] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0111] Step 1:
[0112] The device receives voice input from the user. When the user speaks into their smartphone or smart glasses, saying, "I have a meeting tomorrow at 2pm," the voice is recorded on the device. The input here is voice data, which is then prepared for subsequent processing.
[0113] Step 2:
[0114] The server receives audio data from the terminal and converts the audio into text using the speech_recognition library. In this step, the input audio data is analyzed and the resulting text information is output. Audio analysis techniques are used to ensure accurate language recognition.
[0115] Step 3:
[0116] The server analyzes the converted text information using a natural language processing algorithm and extracts important information related to the schedule. The input is text information, from which keywords such as date, time, and event name are identified, and an organized information structure is generated as output.
[0117] Step 4:
[0118] The server prioritizes based on the analyzed information and performs translations as needed using the Google Translate library. Here, it handles cases where the user requires foreign language interpretation, outputting the results of converting the input analyzed information into another language.
[0119] Step 5:
[0120] The server sends prioritized schedule information and translated information to the terminal. The terminal receives this information and displays a notification on the user's screen. The input here is the information sent from the server, and the output is a visual or audible notification to the user.
[0121] Step 6:
[0122] The user acts based on the information provided by the server. They review the presented schedule and translation content, and make necessary changes to their plans or communicate accordingly. In this step, the user's input is the provided information, and they take specific actions based on this information.
[0123] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0124] This invention is an advanced AI system that recognizes the emotional state of a user and provides information accordingly. The system is configured as follows:
[0125] First, the server obtains the user's permission to collect emails and voice data. This data is then converted into text data using speech recognition technology. The server then applies natural language processing to the converted text data to extract important information. This allows the server to obtain information about tasks and schedules from the user's conversations and email content.
[0126] Next, the server uses an emotion recognition engine to analyze the user's voice data. This analysis identifies emotions from the tone of voice and speaking tempo. For example, if the user is feeling stressed, that emotional state will be identified.
[0127] Subsequently, the server uses a prioritization mechanism to determine the priority of tasks and schedules based on the extracted information and emotional state, and makes adjustments to ensure appropriate action is taken. This adjustment takes the user's emotional state into consideration and incorporates measures to avoid further stress.
[0128] The device receives results sent from the server and provides information optimized for the user. For example, if the user is feeling stressed, it will encourage them to start with simpler tasks to reduce the burden. Furthermore, a multilingual translation function will be activated as needed to support smoother conversations in foreign languages.
[0129] In this way, the invention-based system detects human emotions and, through the provision of information and communication support tailored to those emotions, creates an environment where users can live their daily lives efficiently and comfortably. Specific examples include providing relaxing information before a tense meeting and offering easing translations during conversations.
[0130] The following describes the processing flow.
[0131] Step 1:
[0132] The server, with the user's permission, securely retrieves the user's email and voice data. This allows information related to the user's daily activities to be incorporated into the system.
[0133] Step 2:
[0134] The server converts the acquired audio data into text data using speech recognition technology. This is necessary to make everyday conversations and meeting content into an analyzable format.
[0135] Step 3:
[0136] The server applies natural language processing techniques to the converted text data to extract important information, tasks, and schedules. This process involves understanding the context and determining what the user is looking for.
[0137] Step 4:
[0138] The server uses an emotion recognition engine to analyze the user's emotional state from the voice data. This analysis takes into account the tone, pitch, and speed of the voice to identify, for example, whether the user is feeling stressed.
[0139] Step 5:
[0140] The server integrates task and schedule information obtained from information analysis with emotional states and determines their priorities using a prioritization mechanism. It takes emotional states into consideration, for example, setting an order that reduces the burden if stress levels are high.
[0141] Step 6:
[0142] The server performs necessary translations using translation tools to support smooth communication between multiple languages. In this process, consideration is given to selecting emotionally positive expressions for translation.
[0143] Step 7:
[0144] The terminal notifies the user of the most relevant information and prioritized tasks sent from the server. This allows the user to understand the situation in real time and take appropriate action.
[0145] Step 8:
[0146] The user accesses information provided through the terminal and performs tasks in the order suggested by the system. They also utilize translated information as needed to facilitate communication in foreign languages.
[0147] (Example 2)
[0148] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0149] Conventional information delivery systems have not adequately considered the emotional state of users, resulting in an excessive burden on users. Furthermore, smooth communication across language barriers was sometimes difficult. Therefore, there is a need for a system that accurately recognizes users' emotions, appropriately prioritizes tasks and schedules based on those emotions, and enables smooth information delivery across multiple languages.
[0150] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0151] In this invention, the server includes data collection means for acquiring user communication data and voice data, speech recognition means for converting the acquired voice data into text data, and natural language processing means for analyzing the text data and extracting important information. This enables the provision of information that takes into account the user's emotional state and smooth communication between multiple languages.
[0152] "Data collection means" refers to methods for acquiring communication data and voice data with the user's permission.
[0153] "Speech recognition means" refers to a technology or device for converting acquired speech data into text data.
[0154] "Natural language processing means" refers to technologies or devices for analyzing text data and extracting important information from it.
[0155] "Emotion recognition means" refers to a technology or device that analyzes a user's voice data to identify their emotional state.
[0156] A "decision-making tool" is a technology or device used to determine the priority of tasks or appointments based on extracted information and emotional states.
[0157] "Information provision means" refers to the means of providing information optimized for the user.
[0158] "Translation support means" refers to technologies or devices for performing automatic translation between multiple languages.
[0159] This invention is an advanced system that provides information according to the user's emotional state. The system is configured as follows:
[0160] Data collection
[0161] The server collects communication data and voice data with the user's permission. Users grant access to the server by providing data from their devices. Voice data is converted into text data using speech recognition technology. This technology utilizes commonly used cloud-based speech recognition APIs.
[0162] Data Analysis
[0163] The server applies natural language processing to the converted text data. The software libraries used as gas include widely used natural language processing tools. This extracts important information from conversations and communications.
[0164] Emotion recognition and decision-making
[0165] The server analyzes the voice data to determine the user's emotional state. At this stage, emotion recognition technology is used to evaluate the tone, speed, and intonation of the voice. Furthermore, based on the extracted information and emotional data, a generative AI model is used to determine the priority of tasks and schedules.
[0166] Information provision and multilingual support
[0167] The terminal receives optimization information sent from the server. The user receives this information directly through the terminal. The information provision includes a multilingual translation function to support smooth communication between foreign languages.
[0168] For example, if a user is in a highly stressful situation, the system can adjust priority tasks to reduce the burden and suggest relaxing media.
[0169] An example of a prompt message might be: "Recognize the user's emotions and suggest relaxing content to reduce stress. Example: Present relaxing videos or music before a high-stress conversation."
[0170] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0171] Step 1:
[0172] The server collects voice and communication data from the terminal with the user's permission. During this process, the server performs explicit permission checks to securely collect the user's voice and text information. It takes voice and text data as input and stores this data as output in a format usable for subsequent processing. Specifically, the server receives the voice stream via the specified API and prepares it for real-time processing.
[0173] Step 2:
[0174] The server uses speech recognition to convert the acquired audio data into text data. At this stage, the audio data is used as input and converted into text format by cloud-based speech recognition technology. The output is the converted text data. Specifically, the server converts the information extracted from the audio into a text string and stores it in a database.
[0175] Step 3:
[0176] The server applies natural language processing to the converted text data to extract important information. It uses text data as input and analyzes the information using natural language processing libraries. The output is extracted task and schedule information. Specifically, the server performs keyword extraction and semantic analysis of sentences.
[0177] Step 4:
[0178] The server uses emotion recognition tools to analyze voice data and identify the user's emotional state. It uses the original voice data as input and estimates emotions by analyzing voice tone and speed. The output is the recognized emotional status. Specifically, the server analyzes the voice profile to identify emotions such as stress and relaxation.
[0179] Step 5:
[0180] The server uses a generative AI model to prioritize tasks and schedules based on extracted information and emotional states. It utilizes the extracted task information and emotional status as input data to create an optimal priority list output by the generative model. Specifically, the server sends prompts to the generative AI and receives the optimized schedule.
[0181] Step 6:
[0182] The terminal receives optimized information sent from the server and provides it to the user. It receives data from the server as input and formats it for user notification as output. Specifically, the terminal displays a priority task list in the application UI and performs multilingual translation as needed.
[0183] (Application Example 2)
[0184] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0185] There is a need to accurately recognize the emotional state of users, provide information accordingly, and achieve optimal management of tasks and schedules. However, conventional systems have insufficient coordination between emotional analysis and information provision, making it difficult to respond flexibly to the emotional state of users. Furthermore, smooth communication support in multiple languages is not adequately provided.
[0186] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0187] In this invention, the server includes data acquisition means for acquiring user communication data and voice data, information analysis means for analyzing text data and extracting important information, and priority determination means for determining the priority of tasks and schedules based on emotional state. This enables optimal information provision and task management in accordance with the user's emotions, and also realizes smooth communication between multiple languages.
[0188] A "data acquisition method" is a function that collects user voice data and communication data and uses that information for subsequent processing.
[0189] A "speech conversion means" is a function that converts acquired speech data into text data, and is a method that makes speech information analyzable as text information.
[0190] "Information analysis means" refers to a function that performs processing to extract important information from text data, and can utilize natural language processing technology.
[0191] "Emotional analysis means" refers to a function that identifies the user's emotional state from voice data, and is a technology that makes judgments based on the tone and tempo of the voice.
[0192] A "prioritization mechanism" is a function that determines the priority of tasks and schedules based on extracted information and emotional states, and provides information efficiently.
[0193] "Information provision means" refers to a function that provides users with optimized information and suggestions, and supports communication while taking into account the user's emotional state.
[0194] "Translation tools" refer to functions that perform automatic translation between multiple languages, facilitating communication between users who speak different languages.
[0195] The "sound medium selection means" is a function that selects an appropriate sound medium based on the user's emotional changes and provides that sound medium to the user.
[0196] To implement this invention, the system mainly consists of a server and a user terminal. The server uses data acquisition means to collect communication and voice data with the user's permission. The voice data is converted into text data by speech conversion means, which is then analyzed by information analysis means to extract important information. This analysis uses a Python-based system, employing the Google® Cloud Speech-to-Text API, and the Hugging Face Transformers library is used for natural language processing.
[0197] The server uses emotion analysis tools, specifically Azure Cognitive Services, to identify the user's emotional state from the tone and tempo of their voice. This provides specific emotional information, such as whether the user is stressed. This information is processed by a prioritization tool, and task and schedule priorities are determined based on the emotional state.
[0198] The information provision system is connected to the terminal and provides optimal information and suggestions tailored to the user's emotions. Specifically, when a user is feeling stressed, it prioritizes tasks that have a stress-relieving effect and encourages the playback of relaxing music. The translation system supports multiple languages to facilitate communication between users who speak different languages, and the audio media selection system selects and plays audio media appropriate to the user's emotions.
[0199] For example, if a user is tired after work, the system will play relaxing music along with a message saying, "You must be tired. Please take it easy today." Furthermore, when foreign visitors arrive, the system will provide real-time translation to facilitate conversation and support communication with visitors.
[0200] An example of a prompt message would be: "If the user indicates stress in the sentiment analysis, please come up with suggestions or options to help them relax."
[0201] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0202] Step 1:
[0203] The server uses data acquisition methods to collect communication data and voice data with the user's permission. The input for this step is the user's real-time voice data, which the server captures as an audio file. The output is the captured audio data file.
[0204] Step 2:
[0205] The server uses a speech-to-text conversion method to convert the acquired audio data into text data. The input for this step is an audio data file, and the server performs the conversion using the Google Cloud Speech-to-Text API. The output is the converted text data, which can be processed as natural language.
[0206] Step 3:
[0207] The server extracts important information from text data using information analysis tools. The input is text data, and the server analyzes the information using natural language processing techniques to identify details of tasks and schedules. The output is the extracted information.
[0208] Step 4:
[0209] The server uses sentiment analysis tools to identify the user's emotional state from the tone and tempo of the voice data. The input for this step is voice data, and the server performs the analysis using Azure Cognitive Services. The output is the identified emotional state.
[0210] Step 5:
[0211] The server uses a prioritization mechanism to determine the priority of tasks and schedules based on extracted information and emotional states. The input consists of extracted information and emotional states, which the server analyzes and calculates priorities. The output is a list of prioritized tasks.
[0212] Step 6:
[0213] The device provides optimized information to the user through an information delivery system and makes suggestions based on their emotional state. The input consists of prioritized tasks and emotional states; the device generates suggestions based on this information and notifies the user. The output is specific suggestions to the user.
[0214] Step 7:
[0215] The device uses translation tools to automatically translate between multiple languages, facilitating smooth communication. The input is the user's spoken text, which the device translates into another language and outputs. The output is the translated text.
[0216] Step 8:
[0217] The terminal uses an audio medium selection mechanism to choose and play audio media that matches the user's emotions. The input is the user's emotional state, and the terminal performs the operation to select the optimal audio media. The output is the audio media being played.
[0218] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0219] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0220] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0221] [Second Embodiment]
[0222] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0223] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0224] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0225] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0226] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0227] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0228] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0229] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0230] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0231] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0232] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0233] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0234] This invention is an AI system designed to streamline schedule management, task management, information retrieval, and multilingual communication in daily life. Specifically, it operates through the collaboration of a server, a terminal, and a user. First, the server, with the user's permission, collects the user's email and voice data. This allows the server to obtain necessary information from the vast amount of daily communication data.
[0235] Next, the server uses speech recognition technology to convert the audio data into text. This converted text is then analyzed by the server's information analysis tools to extract important keywords, scheduled dates, tasks, and other relevant information. This clarifies the necessary information and provides the foundational data for schedule and task management.
[0236] Furthermore, the server uses a prioritization mechanism to automatically determine the priority of tasks and schedules based on the acquired information. This decision-making process takes into account the user's past behavior history and current situation, resulting in the creation of individually optimized plans.
[0237] Subsequently, the server provides users with the most relevant information through its information delivery system. For example, notifications regarding important meeting schedules or tasks requiring prior preparation are displayed on the user's device as needed. This allows users to take immediate action as required.
[0238] Furthermore, the server uses translation tools to automatically translate text and audio data to support smooth communication between languages. The terminal receives the translated text and audio data and presents it to the user. For example, it can be used to provide real-time translation during meetings conducted in foreign languages, allowing users to understand the content in their own language.
[0239] The introduction of this system will enable users to efficiently manage their schedules and tasks, and to obtain necessary information without being overwhelmed by information overload. It will also facilitate communication that transcends language barriers.
[0240] The following describes the processing flow.
[0241] Step 1:
[0242] The server, with the user's permission, accesses the user's mail server to collect the latest email data. It also retrieves meeting audio data from cloud storage.
[0243] Step 2:
[0244] The server applies speech recognition technology to the collected audio data and converts it into accurate text data. During this process, noise is filtered to improve conversion accuracy.
[0245] Step 3:
[0246] The server analyzes the converted text data using a natural language processing engine to extract important keywords, deadlines, and priority tasks. This clarifies the key points of the information.
[0247] Step 4:
[0248] Based on the extracted information, the server prioritizes tasks and schedules, taking into account the user's past behavior patterns and current plans.
[0249] Step 5:
[0250] The server searches for external information related to the user and retrieves the most relevant information regarding schedules and tasks. This information is intended to support the user's actions.
[0251] Step 6:
[0252] The server translates conversations and text data into a language the user can understand to facilitate multilingual communication. This is done using natural language processing technology.
[0253] Step 7:
[0254] The device receives notifications from the server and informs the user of high-priority tasks and schedules. For example, it prompts the user to take action by sending notifications about urgent tasks.
[0255] Step 8:
[0256] The device displays the translation results received from the server and plays them back as audio if necessary. This allows users to communicate more smoothly in foreign languages.
[0257] (Example 1)
[0258] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0259] In modern society, users are surrounded by vast amounts of communication and audio data, facing information overload. Furthermore, language barriers exist in communication with people who speak different languages. These factors complicate schedule and task management in daily life and work, hindering efficient action. This invention aims to solve these problems and enable users to appropriately manage and acquire the information they need.
[0260] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0261] In this invention, the server includes data acquisition means for acquiring user communication data and voice data, voice conversion means for converting acquired voice data into text data, information analysis means for analyzing text data and extracting important information, priority determination means for determining the priority of tasks and schedules based on the analyzed information and considering the user's past and present behavioral history, information provision means for providing optimal information to the user through notifications, and translation means for performing automatic text conversion between multiple languages. As a result, users can efficiently manage their schedules and tasks, and freely acquire and use information across language barriers.
[0262] "Users" refers to individuals or groups who use the system for information management and communication.
[0263] "Communication data" refers to digital data used by users to exchange information with others, such as emails, messages, and voice communications.
[0264] "Audio data" refers to digital audio data that includes information recorded by the user in voice.
[0265] "Data acquisition means" refers to the function by which the system collects communication data and voice data from users.
[0266] "Voice conversion means" refers to technology that converts acquired voice data into text format.
[0267] "Text data" refers to information in text format that has been converted by a speech-to-text conversion method.
[0268] "Information analysis means" refers to a function that analyzes text data and extracts important information and keywords from it.
[0269] "Priority determination method" refers to a technique that sets priorities based on analyzed information, taking into account the importance and urgency of tasks and appointments.
[0270] "Information provision means" refers to a function that notifies users of necessary information at the appropriate time.
[0271] "Translation means" refers to a function that performs automatic text or audio translation between multiple languages.
[0272] An "information processing system" refers to the overall mechanism that combines these means to support users' information management and communication.
[0273] This invention uses an information processing system to streamline user schedule management, task management, and multilingual communication. At the core of the system is a server, which collects communication and voice data with the user's permission. The server has the capability to retrieve data from the user's email service and voice assistant service via an internet connection.
[0274] The collected audio data is converted into text format by the server using "speech recognition technology." Specifically, a "speech recognition API" is used in this process, and the information in the audio file is stored on the server as text data.
[0275] The text data is then analyzed using "natural language processing technology." The server uses a "text analysis library" to automatically extract important information, keywords, due dates, and task details. Based on this information, the server determines the priority of tasks and appointments using a "priority determination mechanism." This uses an algorithm that takes into account past data and current usage.
[0276] The server then provides information to the user's device through a "notification system," ensuring that important schedules and tasks are immediately recognized by the user. The device displays the received notifications on its screen, helping the user take relevant actions quickly.
[0277] Furthermore, the system utilizes "automatic translation technology" to support multilingual communication. The server uses a "translation API" to convert text and voice messages into the required language, and the terminal presents it to the user as audio or text display.
[0278] For example, if a user enters a prompt such as, "Add a lunch date with a friend this Sunday," the system will register the date in the schedule and notify them. It also has a function to translate and present meeting agendas in response to prompts such as, "Summarize the contents of next Wednesday's meeting in English."
[0279] In this way, this system enables users to effectively manage schedules and tasks, and smoothly obtain information and conduct activities even in different language environments.
[0280] The flow of the specific process in Example 1 will be described using FIG. 11.
[0281] Step 1:
[0282] The server acquires the communication data and voice data of the user. As inputs, there are e-mails and voice logs for which the user's permission has been obtained. The server uses the "data acquisition method" to integrate these data within the server. As an output, a database of communication data and voice data is formed. As specific operations, the process of the server accessing e-mails via the "API" and acquiring new e-mails and attached files is included.
[0283] Step 2:
[0284] The server converts the acquired voice data into text data. As an input, the voice data is supplied to the server's speech recognition system. The server uses the "speech recognition engine" to perform data processing for converting voice into character information. As an output, the voice data is converted into data in text format. Specific operations include the analysis of voice samples and phoneme identification, and as a result, text data is generated.
[0285] Step 3:
[0286] The server analyzes the text data and extracts important information. As an input, the data converted into text format is input into the information analysis system. The server uses the "natural language processing library" to analyze the elements of the text and extract important keywords, dates, and task names. The output is that these extracted elements are provided in list form. Specific operations include noun phrase analysis and the process of identifying relationships.
[0287] Step 4:
[0288] The server determines the priority of tasks and appointments based on the extracted information. Key keywords and schedule information are sent to the prioritization algorithm as input. The server uses a "decision engine" to analyze the data and automatically set priorities. The output is a list of tasks and appointments organized according to their priority. Specific actions include assigning the most appropriate priority by considering historical data and current importance.
[0289] Step 5:
[0290] The server generates notification information to provide users with the most relevant information and sends it to their devices. Inputs include a prioritized task list and schedule information. The server uses a "notification generation system" to send important tasks and appointments to the user's device. Output is a schedule notification displayed on the user's screen. Specifically, this is displayed as a pop-up notification on the device, making it easy for the user to check.
[0291] Step 6:
[0292] The server performs automatic translation between multiple languages as needed. Text data requiring translation is sent to the server as input. The server uses a "translation engine" to process the text and convert it into another language. The output is the translated text data. In practice, meeting notes are translated in real time and displayed on the terminal.
[0293] (Application Example 1)
[0294] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0295] In modern society, many people face a massive amount of information and multinational communication. This makes effective time management and smooth communication across multiple languages difficult. Furthermore, there is a need for efficient schedule management using voice instructions and information-based two-way translation. Conventional systems have been unable to adequately support these needs, leaving challenges in improving individual time efficiency and promoting international understanding.
[0296] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0297] In this invention, the server includes a processing device for acquiring user data, a speech conversion device for converting acquired speech data into text information, an information analysis device for analyzing text information and extracting important information, a priority determination device for determining the priority of schedules and work instructions, an information presentation device for providing optimal information to the user, a translation device for performing automatic translation between multiple languages, and a two-way communication device for managing schedules based on speech instructions and translating speech data into other languages. This enables users to efficiently and effectively manage their schedules and communicate across multiple languages.
[0298] "Processing device for acquiring user data" refers to hardware or software for collecting and analyzing voice and communication data with the user's permission.
[0299] A "voice conversion means" is a technical device that has the function of accurately converting acquired voice data into text information.
[0300] "Information analysis device means" refers to a system or program for analyzing converted character information and extracting important information.
[0301] A "priority determination device" is a device or mechanism that determines the order in which schedules and tasks are to be performed based on analyzed information.
[0302] The "information presentation device means" is a device or module for presenting or notifying optimized information to users.
[0303] The "translation device means" is a system that provides a function for automatic translation between different languages.
[0304] The "bidirectional communication device means" is a device for managing schedules based on voice instructions and transmitting and receiving information translated into other languages in real time.
[0305] The system for implementing this invention mainly operates efficiently through the cooperation among a server, a terminal, and a user. The server is responsible for collecting communication data and voice data with the permission of the user. For data collection, portable devices such as smart glasses and smartphones are used, and the applications operating on these devices cooperate with the server to obtain the necessary data.
[0306] First, the server uses a software library called speech_recognition to convert voice data into character information. The converted character information is analyzed using libraries such as the googletrans library, and important information is extracted. Through this analysis, the necessary information is clarified, and the basic data used for schedule and task management is prepared.
[0307] After that, based on the analyzed information, the server automatically determines the priorities of schedules and tasks. In this prioritization, the user's past behavior history and current situation are also considered, so a schedule optimized for each user is formulated.
[0308] The terminal displays the information provided by the server to the user and notifies important schedules and tasks. As a result, the user can immediately execute the necessary actions. Also, with the translation function, real-time translation between different languages is performed, providing support for the user to understand information in their native language even in meetings conducted in a foreign language.
[0309] For example, if a user says, "There is a meeting tomorrow at 10 AM," the system recognizes this as voice input and immediately updates the schedule. Furthermore, this information can be translated into a foreign language and communicated to foreign colleagues as needed. This system accurately processes user voice instructions, manages schedules via voice, and translates the content into other languages such as English.
[0310] An example of a prompt is: "Manage my schedule by voice and translate the content into English."
[0311] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0312] Step 1:
[0313] The device receives voice input from the user. When the user speaks into their smartphone or smart glasses, saying, "I have a meeting tomorrow at 2pm," the voice is recorded on the device. The input here is voice data, which is then prepared for subsequent processing.
[0314] Step 2:
[0315] The server receives audio data from the terminal and converts the audio into text using the speech_recognition library. In this step, the input audio data is analyzed and the resulting text information is output. Audio analysis techniques are used to ensure accurate language recognition.
[0316] Step 3:
[0317] The server analyzes the converted text information using a natural language processing algorithm and extracts important information related to the schedule. The input is text information, from which keywords such as date, time, and event name are identified, and an organized information structure is generated as output.
[0318] Step 4:
[0319] The server prioritizes based on the analyzed information and performs translations as needed using the Google Translate library. Here, it handles cases where the user requires foreign language interpretation, outputting the results of converting the input analyzed information into another language.
[0320] Step 5:
[0321] The server sends prioritized schedule information and translated information to the terminal. The terminal receives this information and displays a notification on the user's screen. The input here is the information sent from the server, and the output is a visual or audible notification to the user.
[0322] Step 6:
[0323] The user acts based on the information provided by the server. They review the presented schedule and translation content, and make necessary changes to their plans or communicate accordingly. In this step, the user's input is the provided information, and they take specific actions based on this information.
[0324] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0325] This invention is an advanced AI system that recognizes the emotional state of a user and provides information accordingly. The system is configured as follows:
[0326] First, the server obtains the user's permission to collect emails and voice data. This data is then converted into text data using speech recognition technology. The server then applies natural language processing to the converted text data to extract important information. This allows the server to obtain information about tasks and schedules from the user's conversations and email content.
[0327] Next, the server uses an emotion recognition engine to analyze the user's voice data. This analysis identifies emotions from the tone of voice and speaking tempo. For example, if the user is feeling stressed, that emotional state will be identified.
[0328] Subsequently, the server uses a prioritization mechanism to determine the priority of tasks and schedules based on the extracted information and emotional state, and makes adjustments to ensure appropriate action is taken. This adjustment takes the user's emotional state into consideration and incorporates measures to avoid further stress.
[0329] The device receives results sent from the server and provides information optimized for the user. For example, if the user is feeling stressed, it will encourage them to start with simpler tasks to reduce the burden. Furthermore, a multilingual translation function will be activated as needed to support smoother conversations in foreign languages.
[0330] In this way, the invention-based system detects human emotions and, through the provision of information and communication support tailored to those emotions, creates an environment where users can live their daily lives efficiently and comfortably. Specific examples include providing relaxing information before a tense meeting and offering easing translations during conversations.
[0331] The following describes the processing flow.
[0332] Step 1:
[0333] The server, with the user's permission, securely retrieves the user's email and voice data. This allows information related to the user's daily activities to be incorporated into the system.
[0334] Step 2:
[0335] The server converts the acquired audio data into text data using speech recognition technology. This is necessary to make everyday conversations and meeting content into an analyzable format.
[0336] Step 3:
[0337] The server applies natural language processing techniques to the converted text data to extract important information, tasks, and schedules. This process involves understanding the context and determining what the user is looking for.
[0338] Step 4:
[0339] The server uses an emotion recognition engine to analyze the user's emotional state from the voice data. This analysis takes into account the tone, pitch, and speed of the voice to identify, for example, whether the user is feeling stressed.
[0340] Step 5:
[0341] The server integrates task and schedule information obtained from information analysis with emotional states and determines their priorities using a prioritization mechanism. It takes emotional states into consideration, for example, setting an order that reduces the burden if stress levels are high.
[0342] Step 6:
[0343] The server performs necessary translations using translation tools to support smooth communication between multiple languages. In this process, consideration is given to selecting emotionally positive expressions for translation.
[0344] Step 7:
[0345] The terminal notifies the user of the most relevant information and prioritized tasks sent from the server. This allows the user to understand the situation in real time and take appropriate action.
[0346] Step 8:
[0347] The user accesses information provided through the terminal and performs tasks in the order suggested by the system. They also utilize translated information as needed to facilitate communication in foreign languages.
[0348] (Example 2)
[0349] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0350] Conventional information delivery systems have not adequately considered the emotional state of users, resulting in an excessive burden on users. Furthermore, smooth communication across language barriers was sometimes difficult. Therefore, there is a need for a system that accurately recognizes users' emotions, appropriately prioritizes tasks and schedules based on those emotions, and enables smooth information delivery across multiple languages.
[0351] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0352] In this invention, the server includes data collection means for acquiring user communication data and voice data, speech recognition means for converting the acquired voice data into text data, and natural language processing means for analyzing the text data and extracting important information. This enables the provision of information that takes into account the user's emotional state and smooth communication between multiple languages.
[0353] "Data collection means" refers to methods for acquiring communication data and voice data with the user's permission.
[0354] "Speech recognition means" refers to a technology or device for converting acquired speech data into text data.
[0355] "Natural language processing means" refers to technologies or devices for analyzing text data and extracting important information from it.
[0356] "Emotion recognition means" refers to a technology or device that analyzes a user's voice data to identify their emotional state.
[0357] A "decision-making tool" is a technology or device used to determine the priority of tasks or appointments based on extracted information and emotional states.
[0358] "Information provision means" refers to the means of providing information optimized for the user.
[0359] "Translation support means" refers to technologies or devices for performing automatic translation between multiple languages.
[0360] This invention is an advanced system that provides information according to the user's emotional state. The system is configured as follows:
[0361] Data collection
[0362] The server collects communication data and voice data with the user's permission. Users grant access to the server by providing data from their devices. Voice data is converted into text data using speech recognition technology. This technology utilizes commonly used cloud-based speech recognition APIs.
[0363] Data Analysis
[0364] The server applies natural language processing to the converted text data. The software libraries used as gas include widely used natural language processing tools. This extracts important information from conversations and communications.
[0365] Emotion recognition and decision-making
[0366] The server analyzes the voice data to determine the user's emotional state. At this stage, emotion recognition technology is used to evaluate the tone, speed, and intonation of the voice. Furthermore, based on the extracted information and emotional data, a generative AI model is used to determine the priority of tasks and schedules.
[0367] Information provision and multilingual support
[0368] The terminal receives optimization information sent from the server. The user receives this information directly through the terminal. The information provision includes a multilingual translation function to support smooth communication between foreign languages.
[0369] For example, if a user is in a highly stressful situation, the system can adjust priority tasks to reduce the burden and suggest relaxing media.
[0370] An example of a prompt message might be: "Recognize the user's emotions and suggest relaxing content to reduce stress. Example: Present relaxing videos or music before a high-stress conversation."
[0371] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0372] Step 1:
[0373] The server collects voice and communication data from the terminal with the user's permission. During this process, the server performs explicit permission checks to securely collect the user's voice and text information. It takes voice and text data as input and stores this data as output in a format usable for subsequent processing. Specifically, the server receives the voice stream via the specified API and prepares it for real-time processing.
[0374] Step 2:
[0375] The server uses speech recognition to convert the acquired audio data into text data. At this stage, the audio data is used as input and converted into text format by cloud-based speech recognition technology. The output is the converted text data. Specifically, the server converts the information extracted from the audio into a text string and stores it in a database.
[0376] Step 3:
[0377] The server applies natural language processing to the converted text data to extract important information. It uses text data as input and analyzes the information using natural language processing libraries. The output is extracted task and schedule information. Specifically, the server performs keyword extraction and semantic analysis of sentences.
[0378] Step 4:
[0379] The server uses emotion recognition tools to analyze voice data and identify the user's emotional state. It uses the original voice data as input and estimates emotions by analyzing voice tone and speed. The output is the recognized emotional status. Specifically, the server analyzes the voice profile to identify emotions such as stress and relaxation.
[0380] Step 5:
[0381] The server uses a generative AI model to prioritize tasks and schedules based on extracted information and emotional states. It utilizes the extracted task information and emotional status as input data to create an optimal priority list output by the generative model. Specifically, the server sends prompts to the generative AI and receives the optimized schedule.
[0382] Step 6:
[0383] The terminal receives optimized information sent from the server and provides it to the user. It receives data from the server as input and formats it for user notification as output. Specifically, the terminal displays a priority task list in the application UI and performs multilingual translation as needed.
[0384] (Application Example 2)
[0385] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0386] There is a need to accurately recognize the emotional state of users, provide information accordingly, and achieve optimal management of tasks and schedules. However, conventional systems have insufficient coordination between emotional analysis and information provision, making it difficult to respond flexibly to the emotional state of users. Furthermore, smooth communication support in multiple languages is not adequately provided.
[0387] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0388] In this invention, the server includes data acquisition means for acquiring user communication data and voice data, information analysis means for analyzing text data and extracting important information, and priority determination means for determining the priority of tasks and schedules based on emotional state. This enables optimal information provision and task management in accordance with the user's emotions, and also realizes smooth communication between multiple languages.
[0389] A "data acquisition method" is a function that collects user voice data and communication data and uses that information for subsequent processing.
[0390] A "speech conversion means" is a function that converts acquired speech data into text data, and is a method that makes speech information analyzable as text information.
[0391] "Information analysis means" refers to a function that performs processing to extract important information from text data, and can utilize natural language processing technology.
[0392] "Emotional analysis means" refers to a function that identifies the user's emotional state from voice data, and is a technology that makes judgments based on the tone and tempo of the voice.
[0393] A "prioritization mechanism" is a function that determines the priority of tasks and schedules based on extracted information and emotional states, and provides information efficiently.
[0394] "Information provision means" refers to a function that provides users with optimized information and suggestions, and supports communication while taking into account the user's emotional state.
[0395] "Translation tools" refer to functions that perform automatic translation between multiple languages, facilitating communication between users who speak different languages.
[0396] The "sound medium selection means" is a function that selects an appropriate sound medium based on the user's emotional changes and provides that sound medium to the user.
[0397] To implement this invention, the system mainly consists of a server and a user terminal. The server uses data acquisition means to collect communication and voice data with the user's permission. The voice data is converted into text data by speech conversion means, which is then analyzed by information analysis means to extract important information. This analysis uses a Python-based system, employing the Google Cloud Speech-to-Text API, and the Hugging Face Transformers library for natural language processing.
[0398] The server uses emotion analysis tools, along with Azure Cognitive Services, to identify the user's emotional state from the tone and tempo of their voice. This provides specific emotional information, such as whether the user is stressed. This information is then processed by a prioritization tool, and task and schedule priorities are determined based on the emotional state.
[0399] The information provision system is connected to the terminal and provides optimal information and suggestions tailored to the user's emotions. Specifically, when a user is feeling stressed, it prioritizes tasks that have a stress-relieving effect and encourages the playback of relaxing music. The translation system supports multiple languages to facilitate communication between users who speak different languages, and the audio media selection system selects and plays audio media appropriate to the user's emotions.
[0400] For example, if a user is tired after work, the system will play relaxing music along with a message saying, "You must be tired. Please take it easy today." Furthermore, when foreign visitors arrive, the system will provide real-time translation to facilitate conversation and support communication with visitors.
[0401] An example of a prompt message would be: "If the user indicates stress in the sentiment analysis, please come up with suggestions or options to help them relax."
[0402] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0403] Step 1:
[0404] The server uses data acquisition methods to collect communication data and voice data with the user's permission. The input for this step is the user's real-time voice data, which the server captures as an audio file. The output is the captured audio data file.
[0405] Step 2:
[0406] The server uses a speech-to-text conversion method to convert the acquired audio data into text data. The input for this step is an audio data file, and the server performs the conversion using the Google Cloud Speech-to-Text API. The output is the converted text data, which can be processed as natural language.
[0407] Step 3:
[0408] The server extracts important information from text data using information analysis tools. The input is text data, and the server analyzes the information using natural language processing techniques to identify details of tasks and schedules. The output is the extracted information.
[0409] Step 4:
[0410] The server uses sentiment analysis tools to identify the user's emotional state from the tone and tempo of the voice data. The input for this step is voice data, and the server performs the analysis using Azure Cognitive Services. The output is the identified emotional state.
[0411] Step 5:
[0412] The server uses a prioritization mechanism to determine the priority of tasks and schedules based on extracted information and emotional states. The input consists of extracted information and emotional states, which the server analyzes and calculates priorities. The output is a list of prioritized tasks.
[0413] Step 6:
[0414] The device provides optimized information to the user through an information delivery system and makes suggestions based on their emotional state. The input consists of prioritized tasks and emotional states; the device generates suggestions based on this information and notifies the user. The output is specific suggestions to the user.
[0415] Step 7:
[0416] The device uses translation tools to automatically translate between multiple languages, facilitating smooth communication. The input is the user's spoken text, which the device translates into another language and outputs. The output is the translated text.
[0417] Step 8:
[0418] The terminal uses an audio medium selection mechanism to choose and play audio media that matches the user's emotions. The input is the user's emotional state, and the terminal performs the operation to select the optimal audio media. The output is the audio media being played.
[0419] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0420] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0421] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0422] [Third Embodiment]
[0423] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0424] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0425] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0426] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0427] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0428] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0429] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0430] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0431] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0432] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0433] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0434] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0435] This invention is an AI system designed to streamline schedule management, task management, information retrieval, and multilingual communication in daily life. Specifically, it operates through the collaboration of a server, a terminal, and a user. First, the server, with the user's permission, collects the user's email and voice data. This allows the server to obtain necessary information from the vast amount of daily communication data.
[0436] Next, the server uses speech recognition technology to convert the audio data into text. This converted text is then analyzed by the server's information analysis tools to extract important keywords, scheduled dates, tasks, and other relevant information. This clarifies the necessary information and provides the foundational data for schedule and task management.
[0437] Furthermore, the server uses a prioritization mechanism to automatically determine the priority of tasks and schedules based on the acquired information. This decision-making process takes into account the user's past behavior history and current situation, resulting in the creation of individually optimized plans.
[0438] Subsequently, the server provides users with the most relevant information through its information delivery system. For example, notifications regarding important meeting schedules or tasks requiring prior preparation are displayed on the user's device as needed. This allows users to take immediate action as required.
[0439] Furthermore, the server uses translation tools to automatically translate text and audio data to support smooth communication between languages. The terminal receives the translated text and audio data and presents it to the user. For example, it can be used to provide real-time translation during meetings conducted in foreign languages, allowing users to understand the content in their own language.
[0440] The introduction of this system will enable users to efficiently manage their schedules and tasks, and to obtain necessary information without being overwhelmed by information overload. It will also facilitate communication that transcends language barriers.
[0441] The following describes the processing flow.
[0442] Step 1:
[0443] The server, with the user's permission, accesses the user's mail server to collect the latest email data. It also retrieves meeting audio data from cloud storage.
[0444] Step 2:
[0445] The server applies speech recognition technology to the collected audio data and converts it into accurate text data. During this process, noise is filtered to improve conversion accuracy.
[0446] Step 3:
[0447] The server analyzes the converted text data using a natural language processing engine to extract important keywords, deadlines, and priority tasks. This clarifies the key points of the information.
[0448] Step 4:
[0449] Based on the extracted information, the server prioritizes tasks and schedules, taking into account the user's past behavior patterns and current plans.
[0450] Step 5:
[0451] The server searches for external information related to the user and retrieves the most relevant information regarding schedules and tasks. This information is intended to support the user's actions.
[0452] Step 6:
[0453] The server translates conversations and text data into a language the user can understand to facilitate multilingual communication. This is done using natural language processing technology.
[0454] Step 7:
[0455] The device receives notifications from the server and informs the user of high-priority tasks and schedules. For example, it prompts the user to take action by sending notifications about urgent tasks.
[0456] Step 8:
[0457] The device displays the translation results received from the server and plays them back as audio if necessary. This allows users to communicate more smoothly in foreign languages.
[0458] (Example 1)
[0459] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0460] In modern society, users are surrounded by vast amounts of communication and audio data, facing information overload. Furthermore, language barriers exist in communication with people who speak different languages. These factors complicate schedule and task management in daily life and work, hindering efficient action. This invention aims to solve these problems and enable users to appropriately manage and acquire the information they need.
[0461] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0462] In this invention, the server includes data acquisition means for acquiring user communication data and voice data, voice conversion means for converting acquired voice data into text data, information analysis means for analyzing text data and extracting important information, priority determination means for determining the priority of tasks and schedules based on the analyzed information and considering the user's past and present behavioral history, information provision means for providing optimal information to the user through notifications, and translation means for performing automatic text conversion between multiple languages. As a result, users can efficiently manage their schedules and tasks, and freely acquire and use information across language barriers.
[0463] "Users" refers to individuals or groups who use the system for information management and communication.
[0464] "Communication data" refers to digital data used by users to exchange information with others, such as emails, messages, and voice communications.
[0465] "Audio data" refers to digital audio data that includes information recorded by the user in voice.
[0466] "Data acquisition means" refers to the function by which the system collects communication data and voice data from users.
[0467] "Voice conversion means" refers to technology that converts acquired voice data into text format.
[0468] "Text data" refers to information in text format that has been converted by a speech-to-text conversion method.
[0469] "Information analysis means" refers to a function that analyzes text data and extracts important information and keywords from it.
[0470] "Priority determination method" refers to a technique that sets priorities based on analyzed information, taking into account the importance and urgency of tasks and appointments.
[0471] "Information provision means" refers to a function that notifies users of necessary information at the appropriate time.
[0472] "Translation means" refers to a function that performs automatic text or audio translation between multiple languages.
[0473] An "information processing system" refers to the overall mechanism that combines these means to support users' information management and communication.
[0474] This invention uses an information processing system to streamline user schedule management, task management, and multilingual communication. At the core of the system is a server, which collects communication and voice data with the user's permission. The server has the capability to retrieve data from the user's email service and voice assistant service via an internet connection.
[0475] The collected audio data is converted into text format by the server using "speech recognition technology." Specifically, a "speech recognition API" is used in this process, and the information in the audio file is stored on the server as text data.
[0476] The text data is then analyzed using "natural language processing technology." The server uses a "text analysis library" to automatically extract important information, keywords, due dates, and task details. Based on this information, the server determines the priority of tasks and appointments using a "priority determination mechanism." This uses an algorithm that takes into account past data and current usage.
[0477] The server then provides information to the user's device through a "notification system," ensuring that important schedules and tasks are immediately recognized by the user. The device displays the received notifications on its screen, helping the user take relevant actions quickly.
[0478] Furthermore, the system utilizes "automatic translation technology" to support multilingual communication. The server uses a "translation API" to convert text and voice messages into the required language, and the terminal presents it to the user as audio or text display.
[0479] For example, if a user enters a prompt such as, "Add a lunch date with a friend this Sunday," the system will register the date in the schedule and notify them. It also has a function to translate and present meeting agendas in response to prompts such as, "Summarize the contents of next Wednesday's meeting in English."
[0480] Thus, this system enables users to effectively manage their schedules and tasks, and to smoothly obtain information and carry out activities even in different language environments.
[0481] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0482] Step 1:
[0483] The server acquires user communication data and voice data. Inputs include emails and voice logs with the user's permission. The server uses a "data acquisition method" to collect this data internally. Outputs include the formation of databases of communication data and voice data. Specific operations include the server accessing email via an "API" to retrieve new emails and attachments.
[0484] Step 2:
[0485] The server converts acquired audio data into text data. Audio data is fed to the server's speech recognition system as input. The server uses its "speech recognition engine" to perform data processing that converts the audio into text information. The output is the audio data converted into text format. Specific operations include analyzing the audio sample and identifying phonemes, resulting in the generation of text data.
[0486] Step 3:
[0487] The server analyzes text data and extracts important information. As input, data converted to text format is fed into the information analysis system. The server uses a "natural language processing library" to analyze the text elements and extract important keywords, dates, and task names. The output is provided as a list of these extracted elements. Specific operations include noun phrase analysis and relationship identification processes.
[0488] Step 4:
[0489] The server determines the priority of tasks and appointments based on the extracted information. Key keywords and schedule information are sent to the prioritization algorithm as input. The server uses a "decision engine" to analyze the data and automatically set priorities. The output is a list of tasks and appointments organized according to their priority. Specific actions include assigning the most appropriate priority by considering historical data and current importance.
[0490] Step 5:
[0491] The server generates notification information to provide users with the most relevant information and sends it to their devices. Inputs include a prioritized task list and schedule information. The server uses a "notification generation system" to send important tasks and appointments to the user's device. Output is a schedule notification displayed on the user's screen. Specifically, this is displayed as a pop-up notification on the device, making it easy for the user to check.
[0492] Step 6:
[0493] The server performs automatic translation between multiple languages as needed. Text data requiring translation is sent to the server as input. The server uses a "translation engine" to process the text and convert it into another language. The output is the translated text data. In practice, meeting notes are translated in real time and displayed on the terminal.
[0494] (Application Example 1)
[0495] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0496] In modern society, many people face a massive amount of information and multinational communication. This makes effective time management and smooth communication across multiple languages difficult. Furthermore, there is a need for efficient schedule management using voice instructions and information-based two-way translation. Conventional systems have been unable to adequately support these needs, leaving challenges in improving individual time efficiency and promoting international understanding.
[0497] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0498] In this invention, the server includes a processing device for acquiring user data, a speech conversion device for converting acquired speech data into text information, an information analysis device for analyzing text information and extracting important information, a priority determination device for determining the priority of schedules and work instructions, an information presentation device for providing optimal information to the user, a translation device for performing automatic translation between multiple languages, and a two-way communication device for managing schedules based on speech instructions and translating speech data into other languages. This enables users to efficiently and effectively manage their schedules and communicate across multiple languages.
[0499] "Processing device for acquiring user data" refers to hardware or software for collecting and analyzing voice and communication data with the user's permission.
[0500] A "voice conversion means" is a technical device that has the function of accurately converting acquired voice data into text information.
[0501] "Information analysis device means" refers to a system or program for analyzing converted character information and extracting important information.
[0502] A "priority determination device" is a device or mechanism that determines the order in which schedules and tasks are to be performed based on analyzed information.
[0503] "Information presentation device means" refers to equipment or modules for displaying or notifying users of optimized information.
[0504] A "translation device means" is a system that provides the function of performing automatic translation between different languages.
[0505] A "two-way communication device means" is a device that manages schedules based on voice instructions and transmits and receives information translated into other languages in real time.
[0506] The system implementing this invention operates efficiently primarily through cooperation between a server, a terminal, and a user. The server is responsible for collecting communication data and voice data based on the user's permission. Portable devices such as smart glasses and smartphones are used for data collection, and applications running on these devices cooperate with the server to acquire the necessary data.
[0507] The server first uses a software library called speech_recognition to convert speech data into text. This converted text is then analyzed using libraries such as googletrans to extract important information. This analysis clarifies the necessary information and prepares the basic data used for managing schedules and tasks.
[0508] The server then automatically determines the priority of appointments and tasks based on the analyzed information. This prioritization process takes into account the user's past behavior history and current situation, resulting in a schedule optimized for each individual user.
[0509] The terminal displays information provided by the server to the user and notifies them of important appointments and tasks. This allows the user to take immediate action. In addition, a translation function provides real-time translation between different languages, helping users understand information in their native language, even in meetings conducted in a foreign language.
[0510] For example, if a user says, "There is a meeting tomorrow at 10 AM," the system recognizes this as voice input and immediately updates the schedule. Furthermore, this information can be translated into a foreign language and communicated to foreign colleagues as needed. This system accurately processes user voice instructions, manages schedules via voice, and translates the content into other languages such as English.
[0511] An example of a prompt is: "Manage my schedule by voice and translate the content into English."
[0512] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0513] Step 1:
[0514] The device receives voice input from the user. When the user speaks into their smartphone or smart glasses, saying, "I have a meeting tomorrow at 2pm," the voice is recorded on the device. The input here is voice data, which is then prepared for subsequent processing.
[0515] Step 2:
[0516] The server receives audio data from the terminal and converts the audio into text using the speech_recognition library. In this step, the input audio data is analyzed and the resulting text information is output. Audio analysis techniques are used to ensure accurate language recognition.
[0517] Step 3:
[0518] The server analyzes the converted text information using a natural language processing algorithm and extracts important information related to the schedule. The input is text information, from which keywords such as date, time, and event name are identified, and an organized information structure is generated as output.
[0519] Step 4:
[0520] The server prioritizes based on the analyzed information and performs translations as needed using the Google Translate library. Here, it handles cases where the user requires foreign language interpretation, outputting the results of converting the input analyzed information into another language.
[0521] Step 5:
[0522] The server sends prioritized schedule information and translated information to the terminal. The terminal receives this information and displays a notification on the user's screen. The input here is the information sent from the server, and the output is a visual or audible notification to the user.
[0523] Step 6:
[0524] The user acts based on the information provided by the server. They review the presented schedule and translation content, and make necessary changes to their plans or communicate accordingly. In this step, the user's input is the provided information, and they take specific actions based on this information.
[0525] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0526] This invention is an advanced AI system that recognizes the emotional state of a user and provides information accordingly. The system is configured as follows:
[0527] First, the server obtains the user's permission to collect emails and voice data. This data is then converted into text data using speech recognition technology. The server then applies natural language processing to the converted text data to extract important information. This allows the server to obtain information about tasks and schedules from the user's conversations and email content.
[0528] Next, the server uses an emotion recognition engine to analyze the user's voice data. This analysis identifies emotions from the tone of voice and speaking tempo. For example, if the user is feeling stressed, that emotional state will be identified.
[0529] Subsequently, the server uses a prioritization mechanism to determine the priority of tasks and schedules based on the extracted information and emotional state, and makes adjustments to ensure appropriate action is taken. This adjustment takes the user's emotional state into consideration and incorporates measures to avoid further stress.
[0530] The device receives results sent from the server and provides information optimized for the user. For example, if the user is feeling stressed, it will encourage them to start with simpler tasks to reduce the burden. Furthermore, a multilingual translation function will be activated as needed to support smoother conversations in foreign languages.
[0531] In this way, the invention-based system detects human emotions and, through the provision of information and communication support tailored to those emotions, creates an environment where users can live their daily lives efficiently and comfortably. Specific examples include providing relaxing information before a tense meeting and offering easing translations during conversations.
[0532] The following describes the processing flow.
[0533] Step 1:
[0534] The server, with the user's permission, securely retrieves the user's email and voice data. This allows information related to the user's daily activities to be incorporated into the system.
[0535] Step 2:
[0536] The server converts the acquired audio data into text data using speech recognition technology. This is necessary to make everyday conversations and meeting content into an analyzable format.
[0537] Step 3:
[0538] The server applies natural language processing techniques to the converted text data to extract important information, tasks, and schedules. This process involves understanding the context and determining what the user is looking for.
[0539] Step 4:
[0540] The server uses an emotion recognition engine to analyze the user's emotional state from the voice data. This analysis takes into account the tone, pitch, and speed of the voice to identify, for example, whether the user is feeling stressed.
[0541] Step 5:
[0542] The server integrates task and schedule information obtained from information analysis with emotional states and determines their priorities using a prioritization mechanism. It takes emotional states into consideration, for example, setting an order that reduces the burden if stress levels are high.
[0543] Step 6:
[0544] The server performs necessary translations using translation tools to support smooth communication between multiple languages. In this process, consideration is given to selecting emotionally positive expressions for translation.
[0545] Step 7:
[0546] The terminal notifies the user of the most relevant information and prioritized tasks sent from the server. This allows the user to understand the situation in real time and take appropriate action.
[0547] Step 8:
[0548] The user accesses information provided through the terminal and performs tasks in the order suggested by the system. They also utilize translated information as needed to facilitate communication in foreign languages.
[0549] (Example 2)
[0550] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0551] Conventional information delivery systems have not adequately considered the emotional state of users, resulting in an excessive burden on users. Furthermore, smooth communication across language barriers was sometimes difficult. Therefore, there is a need for a system that accurately recognizes users' emotions, appropriately prioritizes tasks and schedules based on those emotions, and enables smooth information delivery across multiple languages.
[0552] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0553] In this invention, the server includes data collection means for acquiring user communication data and voice data, speech recognition means for converting the acquired voice data into text data, and natural language processing means for analyzing the text data and extracting important information. This enables the provision of information that takes into account the user's emotional state and smooth communication between multiple languages.
[0554] "Data collection means" refers to methods for acquiring communication data and voice data with the user's permission.
[0555] "Speech recognition means" refers to a technology or device for converting acquired speech data into text data.
[0556] "Natural language processing means" refers to technologies or devices for analyzing text data and extracting important information from it.
[0557] "Emotion recognition means" refers to a technology or device that analyzes a user's voice data to identify their emotional state.
[0558] A "decision-making tool" is a technology or device used to determine the priority of tasks or appointments based on extracted information and emotional states.
[0559] "Information provision means" refers to the means of providing information optimized for the user.
[0560] "Translation support means" refers to technologies or devices for performing automatic translation between multiple languages.
[0561] This invention is an advanced system that provides information according to the user's emotional state. The system is configured as follows:
[0562] Data collection
[0563] The server collects communication data and voice data with the user's permission. Users grant access to the server by providing data from their devices. Voice data is converted into text data using speech recognition technology. This technology utilizes commonly used cloud-based speech recognition APIs.
[0564] Data Analysis
[0565] The server applies natural language processing to the converted text data. The software libraries used as gas include widely used natural language processing tools. This extracts important information from conversations and communications.
[0566] Emotion recognition and decision-making
[0567] The server analyzes the voice data to determine the user's emotional state. At this stage, emotion recognition technology is used to evaluate the tone, speed, and intonation of the voice. Furthermore, based on the extracted information and emotional data, a generative AI model is used to determine the priority of tasks and schedules.
[0568] Information provision and multilingual support
[0569] The terminal receives optimization information sent from the server. The user receives this information directly through the terminal. The information provision includes a multilingual translation function to support smooth communication between foreign languages.
[0570] For example, if a user is in a highly stressful situation, the system can adjust priority tasks to reduce the burden and suggest relaxing media.
[0571] An example of a prompt message might be: "Recognize the user's emotions and suggest relaxing content to reduce stress. Example: Present relaxing videos or music before a high-stress conversation."
[0572] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0573] Step 1:
[0574] The server collects voice and communication data from the terminal with the user's permission. During this process, the server performs explicit permission checks to securely collect the user's voice and text information. It takes voice and text data as input and stores this data as output in a format usable for subsequent processing. Specifically, the server receives the voice stream via the specified API and prepares it for real-time processing.
[0575] Step 2:
[0576] The server uses speech recognition to convert the acquired audio data into text data. At this stage, the audio data is used as input and converted into text format by cloud-based speech recognition technology. The output is the converted text data. Specifically, the server converts the information extracted from the audio into a text string and stores it in a database.
[0577] Step 3:
[0578] The server applies natural language processing to the converted text data to extract important information. It uses text data as input and analyzes the information using natural language processing libraries. The output is extracted task and schedule information. Specifically, the server performs keyword extraction and semantic analysis of sentences.
[0579] Step 4:
[0580] The server uses emotion recognition tools to analyze voice data and identify the user's emotional state. It uses the original voice data as input and estimates emotions by analyzing voice tone and speed. The output is the recognized emotional status. Specifically, the server analyzes the voice profile to identify emotions such as stress and relaxation.
[0581] Step 5:
[0582] The server uses a generative AI model to prioritize tasks and schedules based on extracted information and emotional states. It utilizes the extracted task information and emotional status as input data to create an optimal priority list output by the generative model. Specifically, the server sends prompts to the generative AI and receives the optimized schedule.
[0583] Step 6:
[0584] The terminal receives optimized information sent from the server and provides it to the user. It receives data from the server as input and formats it for user notification as output. Specifically, the terminal displays a priority task list in the application UI and performs multilingual translation as needed.
[0585] (Application Example 2)
[0586] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0587] There is a need to accurately recognize the emotional state of users, provide information accordingly, and achieve optimal management of tasks and schedules. However, conventional systems have insufficient coordination between emotional analysis and information provision, making it difficult to respond flexibly to the emotional state of users. Furthermore, smooth communication support in multiple languages is not adequately provided.
[0588] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0589] In this invention, the server includes data acquisition means for acquiring user communication data and voice data, information analysis means for analyzing text data and extracting important information, and priority determination means for determining the priority of tasks and schedules based on emotional state. This enables optimal information provision and task management in accordance with the user's emotions, and also realizes smooth communication between multiple languages.
[0590] A "data acquisition method" is a function that collects user voice data and communication data and uses that information for subsequent processing.
[0591] A "speech conversion means" is a function that converts acquired speech data into text data, and is a method that makes speech information analyzable as text information.
[0592] "Information analysis means" refers to a function that performs processing to extract important information from text data, and can utilize natural language processing technology.
[0593] "Emotional analysis means" refers to a function that identifies the user's emotional state from voice data, and is a technology that makes judgments based on the tone and tempo of the voice.
[0594] A "prioritization mechanism" is a function that determines the priority of tasks and schedules based on extracted information and emotional states, and provides information efficiently.
[0595] "Information provision means" refers to a function that provides users with optimized information and suggestions, and supports communication while taking into account the user's emotional state.
[0596] "Translation tools" refer to functions that perform automatic translation between multiple languages, facilitating communication between users who speak different languages.
[0597] The "sound medium selection means" is a function that selects an appropriate sound medium based on the user's emotional changes and provides that sound medium to the user.
[0598] To implement this invention, the system mainly consists of a server and a user terminal. The server uses data acquisition means to collect communication and voice data with the user's permission. The voice data is converted into text data by speech conversion means, which is then analyzed by information analysis means to extract important information. This analysis uses a Python-based system, employing the Google Cloud Speech-to-Text API, and the Hugging Face Transformers library for natural language processing.
[0599] The server uses emotion analysis tools, along with Azure Cognitive Services, to identify the user's emotional state from the tone and tempo of their voice. This provides specific emotional information, such as whether the user is stressed. This information is then processed by a prioritization tool, and task and schedule priorities are determined based on the emotional state.
[0600] The information provision system is connected to the terminal and provides optimal information and suggestions tailored to the user's emotions. Specifically, when a user is feeling stressed, it prioritizes tasks that have a stress-relieving effect and encourages the playback of relaxing music. The translation system supports multiple languages to facilitate communication between users who speak different languages, and the audio media selection system selects and plays audio media appropriate to the user's emotions.
[0601] For example, if a user is tired after work, the system will play relaxing music along with a message saying, "You must be tired. Please take it easy today." Furthermore, when foreign visitors arrive, the system will provide real-time translation to facilitate conversation and support communication with visitors.
[0602] An example of a prompt message would be: "If the user indicates stress in the sentiment analysis, please come up with suggestions or options to help them relax."
[0603] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0604] Step 1:
[0605] The server uses data acquisition methods to collect communication data and voice data with the user's permission. The input for this step is the user's real-time voice data, which the server captures as an audio file. The output is the captured audio data file.
[0606] Step 2:
[0607] The server uses a speech-to-text conversion method to convert the acquired audio data into text data. The input for this step is an audio data file, and the server performs the conversion using the Google Cloud Speech-to-Text API. The output is the converted text data, which can be processed as natural language.
[0608] Step 3:
[0609] The server extracts important information from text data using information analysis tools. The input is text data, and the server analyzes the information using natural language processing techniques to identify details of tasks and schedules. The output is the extracted information.
[0610] Step 4:
[0611] The server uses sentiment analysis tools to identify the user's emotional state from the tone and tempo of the voice data. The input for this step is voice data, and the server performs the analysis using Azure Cognitive Services. The output is the identified emotional state.
[0612] Step 5:
[0613] The server uses a prioritization mechanism to determine the priority of tasks and schedules based on extracted information and emotional states. The input consists of extracted information and emotional states, which the server analyzes and calculates priorities. The output is a list of prioritized tasks.
[0614] Step 6:
[0615] The device provides optimized information to the user through an information delivery system and makes suggestions based on their emotional state. The input consists of prioritized tasks and emotional states; the device generates suggestions based on this information and notifies the user. The output is specific suggestions to the user.
[0616] Step 7:
[0617] The device uses translation tools to automatically translate between multiple languages, facilitating smooth communication. The input is the user's spoken text, which the device translates into another language and outputs. The output is the translated text.
[0618] Step 8:
[0619] The terminal uses an audio medium selection mechanism to choose and play audio media that matches the user's emotions. The input is the user's emotional state, and the terminal performs the operation to select the optimal audio media. The output is the audio media being played.
[0620] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0621] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0622] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0623] [Fourth Embodiment]
[0624] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0625] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0626] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0627] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0628] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0629] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0630] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0631] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0632] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0633] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0634] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0635] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0636] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0637] This invention is an AI system designed to streamline schedule management, task management, information retrieval, and multilingual communication in daily life. Specifically, it operates through the collaboration of a server, a terminal, and a user. First, the server, with the user's permission, collects the user's email and voice data. This allows the server to obtain necessary information from the vast amount of daily communication data.
[0638] Next, the server uses speech recognition technology to convert the audio data into text. This converted text is then analyzed by the server's information analysis tools to extract important keywords, scheduled dates, tasks, and other relevant information. This clarifies the necessary information and provides the foundational data for schedule and task management.
[0639] Furthermore, the server uses a prioritization mechanism to automatically determine the priority of tasks and schedules based on the acquired information. This decision-making process takes into account the user's past behavior history and current situation, resulting in the creation of individually optimized plans.
[0640] Subsequently, the server provides users with the most relevant information through its information delivery system. For example, notifications regarding important meeting schedules or tasks requiring prior preparation are displayed on the user's device as needed. This allows users to take immediate action as required.
[0641] Furthermore, the server uses translation tools to automatically translate text and audio data to support smooth communication between languages. The terminal receives the translated text and audio data and presents it to the user. For example, it can be used to provide real-time translation during meetings conducted in foreign languages, allowing users to understand the content in their own language.
[0642] The introduction of this system will enable users to efficiently manage their schedules and tasks, and to obtain necessary information without being overwhelmed by information overload. It will also facilitate communication that transcends language barriers.
[0643] The following describes the processing flow.
[0644] Step 1:
[0645] The server, with the user's permission, accesses the user's mail server to collect the latest email data. It also retrieves meeting audio data from cloud storage.
[0646] Step 2:
[0647] The server applies speech recognition technology to the collected audio data and converts it into accurate text data. During this process, noise is filtered to improve conversion accuracy.
[0648] Step 3:
[0649] The server analyzes the converted text data using a natural language processing engine to extract important keywords, deadlines, and priority tasks. This clarifies the key points of the information.
[0650] Step 4:
[0651] Based on the extracted information, the server prioritizes tasks and schedules, taking into account the user's past behavior patterns and current plans.
[0652] Step 5:
[0653] The server searches for external information related to the user and retrieves the most relevant information regarding schedules and tasks. This information is intended to support the user's actions.
[0654] Step 6:
[0655] The server translates conversations and text data into a language the user can understand to facilitate multilingual communication. This is done using natural language processing technology.
[0656] Step 7:
[0657] The device receives notifications from the server and informs the user of high-priority tasks and schedules. For example, it prompts the user to take action by sending notifications about urgent tasks.
[0658] Step 8:
[0659] The device displays the translation results received from the server and plays them back as audio if necessary. This allows users to communicate more smoothly in foreign languages.
[0660] (Example 1)
[0661] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0662] In modern society, users are surrounded by vast amounts of communication and audio data, facing information overload. Furthermore, language barriers exist in communication with people who speak different languages. These factors complicate schedule and task management in daily life and work, hindering efficient action. This invention aims to solve these problems and enable users to appropriately manage and acquire the information they need.
[0663] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0664] In this invention, the server includes data acquisition means for acquiring user communication data and voice data, voice conversion means for converting acquired voice data into text data, information analysis means for analyzing text data and extracting important information, priority determination means for determining the priority of tasks and schedules based on the analyzed information and considering the user's past and present behavioral history, information provision means for providing optimal information to the user through notifications, and translation means for performing automatic text conversion between multiple languages. As a result, users can efficiently manage their schedules and tasks, and freely acquire and use information across language barriers.
[0665] "Users" refers to individuals or groups who use the system for information management and communication.
[0666] "Communication data" refers to digital data used by users to exchange information with others, such as emails, messages, and voice communications.
[0667] "Audio data" refers to digital audio data that includes information recorded by the user in voice.
[0668] "Data acquisition means" refers to the function by which the system collects communication data and voice data from users.
[0669] "Voice conversion means" refers to technology that converts acquired voice data into text format.
[0670] "Text data" refers to information in text format that has been converted by a speech-to-text conversion method.
[0671] "Information analysis means" refers to a function that analyzes text data and extracts important information and keywords from it.
[0672] "Priority determination method" refers to a technique that sets priorities based on analyzed information, taking into account the importance and urgency of tasks and appointments.
[0673] "Information provision means" refers to a function that notifies users of necessary information at the appropriate time.
[0674] "Translation means" refers to a function that performs automatic text or audio translation between multiple languages.
[0675] An "information processing system" refers to the overall mechanism that combines these means to support users' information management and communication.
[0676] This invention uses an information processing system to streamline user schedule management, task management, and multilingual communication. At the core of the system is a server, which collects communication and voice data with the user's permission. The server has the capability to retrieve data from the user's email service and voice assistant service via an internet connection.
[0677] The collected audio data is converted into text format by the server using "speech recognition technology." Specifically, a "speech recognition API" is used in this process, and the information in the audio file is stored on the server as text data.
[0678] The text data is then analyzed using "natural language processing technology." The server uses a "text analysis library" to automatically extract important information, keywords, due dates, and task details. Based on this information, the server determines the priority of tasks and appointments using a "priority determination mechanism." This uses an algorithm that takes into account past data and current usage.
[0679] The server then provides information to the user's device through a "notification system," ensuring that important schedules and tasks are immediately recognized by the user. The device displays the received notifications on its screen, helping the user take relevant actions quickly.
[0680] Furthermore, the system utilizes "automatic translation technology" to support multilingual communication. The server uses a "translation API" to convert text and voice messages into the required language, and the terminal presents it to the user as audio or text display.
[0681] For example, if a user enters a prompt such as, "Add a lunch date with a friend this Sunday," the system will register the date in the schedule and notify them. It also has a function to translate and present meeting agendas in response to prompts such as, "Summarize the contents of next Wednesday's meeting in English."
[0682] Thus, this system enables users to effectively manage their schedules and tasks, and to smoothly obtain information and carry out activities even in different language environments.
[0683] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0684] Step 1:
[0685] The server acquires user communication data and voice data. Inputs include emails and voice logs with the user's permission. The server uses a "data acquisition method" to collect this data internally. Outputs include the formation of databases of communication data and voice data. Specific operations include the server accessing email via an "API" to retrieve new emails and attachments.
[0686] Step 2:
[0687] The server converts acquired audio data into text data. Audio data is fed to the server's speech recognition system as input. The server uses its "speech recognition engine" to perform data processing that converts the audio into text information. The output is the audio data converted into text format. Specific operations include analyzing the audio sample and identifying phonemes, resulting in the generation of text data.
[0688] Step 3:
[0689] The server analyzes text data and extracts important information. As input, data converted to text format is fed into the information analysis system. The server uses a "natural language processing library" to analyze the text elements and extract important keywords, dates, and task names. The output is provided as a list of these extracted elements. Specific operations include noun phrase analysis and relationship identification processes.
[0690] Step 4:
[0691] The server determines the priority of tasks and appointments based on the extracted information. Key keywords and schedule information are sent to the prioritization algorithm as input. The server uses a "decision engine" to analyze the data and automatically set priorities. The output is a list of tasks and appointments organized according to their priority. Specific actions include assigning the most appropriate priority by considering historical data and current importance.
[0692] Step 5:
[0693] The server generates notification information to provide users with the most relevant information and sends it to their devices. Inputs include a prioritized task list and schedule information. The server uses a "notification generation system" to send important tasks and appointments to the user's device. Output is a schedule notification displayed on the user's screen. Specifically, this is displayed as a pop-up notification on the device, making it easy for the user to check.
[0694] Step 6:
[0695] The server performs automatic translation between multiple languages as needed. Text data requiring translation is sent to the server as input. The server uses a "translation engine" to process the text and convert it into another language. The output is the translated text data. In practice, meeting notes are translated in real time and displayed on the terminal.
[0696] (Application Example 1)
[0697] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0698] In modern society, many people face a massive amount of information and multinational communication. This makes effective time management and smooth communication across multiple languages difficult. Furthermore, there is a need for efficient schedule management using voice instructions and information-based two-way translation. Conventional systems have been unable to adequately support these needs, leaving challenges in improving individual time efficiency and promoting international understanding.
[0699] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0700] In this invention, the server includes a processing device for acquiring user data, a speech conversion device for converting acquired speech data into text information, an information analysis device for analyzing text information and extracting important information, a priority determination device for determining the priority of schedules and work instructions, an information presentation device for providing optimal information to the user, a translation device for performing automatic translation between multiple languages, and a two-way communication device for managing schedules based on speech instructions and translating speech data into other languages. This enables users to efficiently and effectively manage their schedules and communicate across multiple languages.
[0701] "Processing device for acquiring user data" refers to hardware or software for collecting and analyzing voice and communication data with the user's permission.
[0702] A "voice conversion means" is a technical device that has the function of accurately converting acquired voice data into text information.
[0703] "Information analysis device means" refers to a system or program for analyzing converted character information and extracting important information.
[0704] A "priority determination device" is a device or mechanism that determines the order in which schedules and tasks are to be performed based on analyzed information.
[0705] "Information presentation device means" refers to equipment or modules for displaying or notifying users of optimized information.
[0706] A "translation device means" is a system that provides the function of performing automatic translation between different languages.
[0707] A "two-way communication device means" is a device that manages schedules based on voice instructions and transmits and receives information translated into other languages in real time.
[0708] The system implementing this invention operates efficiently primarily through cooperation between a server, a terminal, and a user. The server is responsible for collecting communication data and voice data based on the user's permission. Portable devices such as smart glasses and smartphones are used for data collection, and applications running on these devices cooperate with the server to acquire the necessary data.
[0709] The server first uses a software library called speech_recognition to convert speech data into text. This converted text is then analyzed using libraries such as googletrans to extract important information. This analysis clarifies the necessary information and prepares the basic data used for managing schedules and tasks.
[0710] The server then automatically determines the priority of appointments and tasks based on the analyzed information. This prioritization process takes into account the user's past behavior history and current situation, resulting in a schedule optimized for each individual user.
[0711] The terminal displays information provided by the server to the user and notifies them of important appointments and tasks. This allows the user to take immediate action. In addition, a translation function provides real-time translation between different languages, helping users understand information in their native language, even in meetings conducted in a foreign language.
[0712] For example, if a user says, "There is a meeting tomorrow at 10 AM," the system recognizes this as voice input and immediately updates the schedule. Furthermore, this information can be translated into a foreign language and communicated to foreign colleagues as needed. This system accurately processes user voice instructions, manages schedules via voice, and translates the content into other languages such as English.
[0713] An example of a prompt is: "Manage my schedule by voice and translate the content into English."
[0714] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0715] Step 1:
[0716] The device receives voice input from the user. When the user speaks into their smartphone or smart glasses, saying, "I have a meeting tomorrow at 2pm," the voice is recorded on the device. The input here is voice data, which is then prepared for subsequent processing.
[0717] Step 2:
[0718] The server receives audio data from the terminal and converts the audio into text using the speech_recognition library. In this step, the input audio data is analyzed and the resulting text information is output. Audio analysis techniques are used to ensure accurate language recognition.
[0719] Step 3:
[0720] The server analyzes the converted text information using a natural language processing algorithm and extracts important information related to the schedule. The input is text information, from which keywords such as date, time, and event name are identified, and an organized information structure is generated as output.
[0721] Step 4:
[0722] The server prioritizes based on the analyzed information and performs translations as needed using the Google Translate library. Here, it handles cases where the user requires foreign language interpretation, outputting the results of converting the input analyzed information into another language.
[0723] Step 5:
[0724] The server sends prioritized schedule information and translated information to the terminal. The terminal receives this information and displays a notification on the user's screen. The input here is the information sent from the server, and the output is a visual or audible notification to the user.
[0725] Step 6:
[0726] The user acts based on the information provided by the server. They review the presented schedule and translation content, and make necessary changes to their plans or communicate accordingly. In this step, the user's input is the provided information, and they take specific actions based on this information.
[0727] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0728] This invention is an advanced AI system that recognizes the emotional state of a user and provides information accordingly. The system is configured as follows:
[0729] First, the server obtains the user's permission to collect emails and voice data. This data is then converted into text data using speech recognition technology. The server then applies natural language processing to the converted text data to extract important information. This allows the server to obtain information about tasks and schedules from the user's conversations and email content.
[0730] Next, the server uses an emotion recognition engine to analyze the user's voice data. This analysis identifies emotions from the tone of voice and speaking tempo. For example, if the user is feeling stressed, that emotional state will be identified.
[0731] Subsequently, the server uses a prioritization mechanism to determine the priority of tasks and schedules based on the extracted information and emotional state, and makes adjustments to ensure appropriate action is taken. This adjustment takes the user's emotional state into consideration and incorporates measures to avoid further stress.
[0732] The device receives results sent from the server and provides information optimized for the user. For example, if the user is feeling stressed, it will encourage them to start with simpler tasks to reduce the burden. Furthermore, a multilingual translation function will be activated as needed to support smoother conversations in foreign languages.
[0733] In this way, the invention-based system detects human emotions and, through the provision of information and communication support tailored to those emotions, creates an environment where users can live their daily lives efficiently and comfortably. Specific examples include providing relaxing information before a tense meeting and offering easing translations during conversations.
[0734] The following describes the processing flow.
[0735] Step 1:
[0736] The server, with the user's permission, securely retrieves the user's email and voice data. This allows information related to the user's daily activities to be incorporated into the system.
[0737] Step 2:
[0738] The server converts the acquired audio data into text data using speech recognition technology. This is necessary to make everyday conversations and meeting content into an analyzable format.
[0739] Step 3:
[0740] The server applies natural language processing techniques to the converted text data to extract important information, tasks, and schedules. This process involves understanding the context and determining what the user is looking for.
[0741] Step 4:
[0742] The server uses an emotion recognition engine to analyze the user's emotional state from the voice data. This analysis takes into account the tone, pitch, and speed of the voice to identify, for example, whether the user is feeling stressed.
[0743] Step 5:
[0744] The server integrates task and schedule information obtained from information analysis with emotional states and determines their priorities using a prioritization mechanism. It takes emotional states into consideration, for example, setting an order that reduces the burden if stress levels are high.
[0745] Step 6:
[0746] The server performs necessary translations using translation tools to support smooth communication between multiple languages. In this process, consideration is given to selecting emotionally positive expressions for translation.
[0747] Step 7:
[0748] The terminal notifies the user of the most relevant information and prioritized tasks sent from the server. This allows the user to understand the situation in real time and take appropriate action.
[0749] Step 8:
[0750] The user accesses information provided through the terminal and performs tasks in the order suggested by the system. They also utilize translated information as needed to facilitate communication in foreign languages.
[0751] (Example 2)
[0752] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0753] Conventional information delivery systems have not adequately considered the emotional state of users, resulting in an excessive burden on users. Furthermore, smooth communication across language barriers was sometimes difficult. Therefore, there is a need for a system that accurately recognizes users' emotions, appropriately prioritizes tasks and schedules based on those emotions, and enables smooth information delivery across multiple languages.
[0754] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0755] In this invention, the server includes data collection means for acquiring user communication data and voice data, speech recognition means for converting the acquired voice data into text data, and natural language processing means for analyzing the text data and extracting important information. This enables the provision of information that takes into account the user's emotional state and smooth communication between multiple languages.
[0756] "Data collection means" refers to methods for acquiring communication data and voice data with the user's permission.
[0757] "Speech recognition means" refers to a technology or device for converting acquired speech data into text data.
[0758] "Natural language processing means" refers to technologies or devices for analyzing text data and extracting important information from it.
[0759] "Emotion recognition means" refers to a technology or device that analyzes a user's voice data to identify their emotional state.
[0760] A "decision-making tool" is a technology or device used to determine the priority of tasks or appointments based on extracted information and emotional states.
[0761] "Information provision means" refers to the means of providing information optimized for the user.
[0762] "Translation support means" refers to technologies or devices for performing automatic translation between multiple languages.
[0763] This invention is an advanced system that provides information according to the user's emotional state. The system is configured as follows:
[0764] Data collection
[0765] The server collects communication data and voice data with the user's permission. Users grant access to the server by providing data from their devices. Voice data is converted into text data using speech recognition technology. This technology utilizes commonly used cloud-based speech recognition APIs.
[0766] Data Analysis
[0767] The server applies natural language processing to the converted text data. The software libraries used as gas include widely used natural language processing tools. This extracts important information from conversations and communications.
[0768] Emotion recognition and decision-making
[0769] The server analyzes the voice data to determine the user's emotional state. At this stage, emotion recognition technology is used to evaluate the tone, speed, and intonation of the voice. Furthermore, based on the extracted information and emotional data, a generative AI model is used to determine the priority of tasks and schedules.
[0770] Information provision and multilingual support
[0771] The terminal receives optimization information sent from the server. The user receives this information directly through the terminal. The information provision includes a multilingual translation function to support smooth communication between foreign languages.
[0772] For example, if a user is in a highly stressful situation, the system can adjust priority tasks to reduce the burden and suggest relaxing media.
[0773] An example of a prompt message might be: "Recognize the user's emotions and suggest relaxing content to reduce stress. Example: Present relaxing videos or music before a high-stress conversation."
[0774] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0775] Step 1:
[0776] The server collects voice and communication data from the terminal with the user's permission. During this process, the server performs explicit permission checks to securely collect the user's voice and text information. It takes voice and text data as input and stores this data as output in a format usable for subsequent processing. Specifically, the server receives the voice stream via the specified API and prepares it for real-time processing.
[0777] Step 2:
[0778] The server uses speech recognition to convert the acquired audio data into text data. At this stage, the audio data is used as input and converted into text format by cloud-based speech recognition technology. The output is the converted text data. Specifically, the server converts the information extracted from the audio into a text string and stores it in a database.
[0779] Step 3:
[0780] The server applies natural language processing to the converted text data to extract important information. It uses text data as input and analyzes the information using natural language processing libraries. The output is extracted task and schedule information. Specifically, the server performs keyword extraction and semantic analysis of sentences.
[0781] Step 4:
[0782] The server uses emotion recognition tools to analyze voice data and identify the user's emotional state. It uses the original voice data as input and estimates emotions by analyzing voice tone and speed. The output is the recognized emotional status. Specifically, the server analyzes the voice profile to identify emotions such as stress and relaxation.
[0783] Step 5:
[0784] The server uses a generative AI model to prioritize tasks and schedules based on extracted information and emotional states. It utilizes the extracted task information and emotional status as input data to create an optimal priority list output by the generative model. Specifically, the server sends prompts to the generative AI and receives the optimized schedule.
[0785] Step 6:
[0786] The terminal receives optimized information sent from the server and provides it to the user. It receives data from the server as input and formats it for user notification as output. Specifically, the terminal displays a priority task list in the application UI and performs multilingual translation as needed.
[0787] (Application Example 2)
[0788] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0789] There is a need to accurately recognize the emotional state of users, provide information accordingly, and achieve optimal management of tasks and schedules. However, conventional systems have insufficient coordination between emotional analysis and information provision, making it difficult to respond flexibly to the emotional state of users. Furthermore, smooth communication support in multiple languages is not adequately provided.
[0790] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0791] In this invention, the server includes data acquisition means for acquiring user communication data and voice data, information analysis means for analyzing text data and extracting important information, and priority determination means for determining the priority of tasks and schedules based on emotional state. This enables optimal information provision and task management in accordance with the user's emotions, and also realizes smooth communication between multiple languages.
[0792] A "data acquisition method" is a function that collects user voice data and communication data and uses that information for subsequent processing.
[0793] A "speech conversion means" is a function that converts acquired speech data into text data, and is a method that makes speech information analyzable as text information.
[0794] "Information analysis means" refers to a function that performs processing to extract important information from text data, and can utilize natural language processing technology.
[0795] "Emotional analysis means" refers to a function that identifies the user's emotional state from voice data, and is a technology that makes judgments based on the tone and tempo of the voice.
[0796] A "prioritization mechanism" is a function that determines the priority of tasks and schedules based on extracted information and emotional states, and provides information efficiently.
[0797] "Information provision means" refers to a function that provides users with optimized information and suggestions, and supports communication while taking into account the user's emotional state.
[0798] "Translation tools" refer to functions that perform automatic translation between multiple languages, facilitating communication between users who speak different languages.
[0799] The "sound medium selection means" is a function that selects an appropriate sound medium based on the user's emotional changes and provides that sound medium to the user.
[0800] To implement this invention, the system mainly consists of a server and a user terminal. The server uses data acquisition means to collect communication and voice data with the user's permission. The voice data is converted into text data by speech conversion means, which is then analyzed by information analysis means to extract important information. This analysis uses a Python-based system, employing the Google Cloud Speech-to-Text API, and the Hugging Face Transformers library for natural language processing.
[0801] The server uses emotion analysis tools, along with Azure Cognitive Services, to identify the user's emotional state from the tone and tempo of their voice. This provides specific emotional information, such as whether the user is stressed. This information is then processed by a prioritization tool, and task and schedule priorities are determined based on the emotional state.
[0802] The information provision system is connected to the terminal and provides optimal information and suggestions tailored to the user's emotions. Specifically, when a user is feeling stressed, it prioritizes tasks that have a stress-relieving effect and encourages the playback of relaxing music. The translation system supports multiple languages to facilitate communication between users who speak different languages, and the audio media selection system selects and plays audio media appropriate to the user's emotions.
[0803] For example, if a user is tired after work, the system will play relaxing music along with a message saying, "You must be tired. Please take it easy today." Furthermore, when foreign visitors arrive, the system will provide real-time translation to facilitate conversation and support communication with visitors.
[0804] An example of a prompt message would be: "If the user indicates stress in the sentiment analysis, please come up with suggestions or options to help them relax."
[0805] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0806] Step 1:
[0807] The server uses data acquisition methods to collect communication data and voice data with the user's permission. The input for this step is the user's real-time voice data, which the server captures as an audio file. The output is the captured audio data file.
[0808] Step 2:
[0809] The server uses a speech-to-text conversion method to convert the acquired audio data into text data. The input for this step is an audio data file, and the server performs the conversion using the Google Cloud Speech-to-Text API. The output is the converted text data, which can be processed as natural language.
[0810] Step 3:
[0811] The server extracts important information from text data using information analysis tools. The input is text data, and the server analyzes the information using natural language processing techniques to identify details of tasks and schedules. The output is the extracted information.
[0812] Step 4:
[0813] The server uses sentiment analysis tools to identify the user's emotional state from the tone and tempo of the voice data. The input for this step is voice data, and the server performs the analysis using Azure Cognitive Services. The output is the identified emotional state.
[0814] Step 5:
[0815] The server uses a prioritization mechanism to determine the priority of tasks and schedules based on extracted information and emotional states. The input consists of extracted information and emotional states, which the server analyzes and calculates priorities. The output is a list of prioritized tasks.
[0816] Step 6:
[0817] The device provides optimized information to the user through an information delivery system and makes suggestions based on their emotional state. The input consists of prioritized tasks and emotional states; the device generates suggestions based on this information and notifies the user. The output is specific suggestions to the user.
[0818] Step 7:
[0819] The device uses translation tools to automatically translate between multiple languages, facilitating smooth communication. The input is the user's spoken text, which the device translates into another language and outputs. The output is the translated text.
[0820] Step 8:
[0821] The terminal uses an audio medium selection mechanism to choose and play audio media that matches the user's emotions. The input is the user's emotional state, and the terminal performs the operation to select the optimal audio media. The output is the audio media being played.
[0822] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0823] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0824] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0825] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0826] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0827] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0828] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0829] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0830] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0831] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0832] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0833] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0834] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0835] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0836] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0837] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0838] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0839] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0840] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0841] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0842] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0843] The following is further disclosed regarding the embodiments described above.
[0844] (Claim 1)
[0845] A data acquisition means for acquiring user communication data and voice data,
[0846] A speech conversion means for converting acquired speech data into text data,
[0847] An information analysis tool that analyzes text data and extracts important information,
[0848] A priority determination means that determines the priority of tasks and schedules based on the analyzed information,
[0849] Information provision means that provide optimal information to users,
[0850] Translation tools that perform automatic translation between multiple languages,
[0851] A system that includes this.
[0852] (Claim 2)
[0853] The system according to claim 1, characterized in that the data acquisition means accesses communication data with the user's permission.
[0854] (Claim 3)
[0855] The system according to claim 1, characterized in that the information analysis means uses natural language processing technology.
[0856] "Example 1"
[0857] (Claim 1)
[0858] A data acquisition method for acquiring user communication data and voice data,
[0859] A speech conversion means for converting acquired speech data into text data,
[0860] An information analysis tool that analyzes text data and extracts important information,
[0861] A priority determination means that determines the priority of tasks and appointments based on the analyzed information, taking into account the user's past and present behavioral history.
[0862] An information provision method that provides users with the most relevant information through notifications,
[0863] A translation method that performs automatic text conversion between multiple languages,
[0864] An information processing system that includes this.
[0865] (Claim 2)
[0866] The information processing system according to claim 1, characterized in that the data acquisition means accesses communication data with the user's permission.
[0867] (Claim 3)
[0868] The information processing system according to claim 1, characterized in that the information analysis means uses natural language processing technology.
[0869] "Application Example 1"
[0870] (Claim 1)
[0871] A processing device for acquiring user data,
[0872] A speech conversion means that converts acquired speech data into text information,
[0873] An information analysis device means for analyzing textual information and extracting important information,
[0874] A priority determination device means that determines the priority of schedules and work instructions based on the analyzed information,
[0875] Information presentation device means that provides optimal information to the user,
[0876] A translation device and means for performing automatic translation between multiple languages,
[0877] A two-way communication device means that manages schedules based on voice instructions and translates voice data into other languages,
[0878] A system that includes this.
[0879] (Claim 2)
[0880] The system according to claim 1, characterized in that the device accesses data with the user's permission.
[0881] (Claim 3)
[0882] The system according to claim 1, characterized in that the analysis device means uses natural language processing technology and performs bidirectional communication based on voice instructions.
[0883] "Example 2 of combining an emotion engine"
[0884] (Claim 1)
[0885] A data collection means for acquiring user communication data and voice data,
[0886] A speech recognition means that converts acquired audio data into text data,
[0887] A natural language processing method that analyzes text data and extracts important information,
[0888] An emotion recognition means that analyzes the user's voice data and identifies their emotional state,
[0889] A decision-making mechanism that determines the priority of tasks and appointments based on extracted information and emotional states,
[0890] Information provision means that provide users with the most suitable information,
[0891] Translation support tools that perform automatic translation between multiple languages,
[0892] A system that includes this.
[0893] (Claim 2)
[0894] The system according to claim 1, characterized in that the data collection means accesses communication data with the user's permission.
[0895] (Claim 3)
[0896] The system according to claim 1, characterized in that the natural language processing means uses natural language processing technology.
[0897] "Application example 2 when combining with an emotional engine"
[0898] (Claim 1)
[0899] A data acquisition method for acquiring user communication data and voice data,
[0900] A speech conversion means for converting acquired speech data into text data,
[0901] An information analysis tool that analyzes text data and extracts important information,
[0902] A means of emotional analysis that identifies emotional states from extracted information and vocal characteristics,
[0903] A prioritization method for determining the priority of tasks and schedules based on emotional state,
[0904] An information delivery method that provides information optimized for the user and makes suggestions according to their emotional state,
[0905] A translation tool that performs automatic translation between multiple languages to support smooth communication between users,
[0906] An audio medium selection means that selects and plays an audio medium based on emotional changes,
[0907] A system that includes this.
[0908] (Claim 2)
[0909] The system according to claim 1, characterized in that the data acquisition means accesses communication data with the user's permission.
[0910] (Claim 3)
[0911] The system according to claim 1, characterized in that the information analysis means uses natural language processing technology and the emotion analysis means determines emotion based on the tone and tempo of the voice. [Explanation of Symbols]
[0912] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A processing device for acquiring user data, A speech conversion means that converts acquired speech data into text information, An information analysis device means for analyzing textual information and extracting important information, A priority determination device means that determines the priority of schedules and work instructions based on the analyzed information, Information presentation device means that provides optimal information to the user, A translation device and means for performing automatic translation between multiple languages, A two-way communication device means that manages schedules based on voice instructions and translates voice data into other languages, A system that includes this.
2. The system according to claim 1, characterized in that the device accesses data with the user's permission.
3. The system according to claim 1, characterized in that the analysis device means uses natural language processing technology and performs bidirectional communication based on voice instructions.