system

The system addresses inefficiencies in schedule management and multilingual support by converting voice data to text, synchronizing schedules, and dynamically adjusting priorities, enhancing operational efficiency and cultural compatibility.

JP2026101252APending Publication Date: 2026-06-22SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-10
Publication Date
2026-06-22

AI Technical Summary

Technical Problem

Existing systems face limitations in efficient schedule management, information analysis, and multilingual support, leading to inefficiencies and potential mistakes, especially in global operations where quick decision-making and optimal priority management are required.

Method used

A system that converts voice data into text information, analyzes it for schedule management, synchronizes across devices, re-evaluates task priorities using data analysis, and provides multilingual support, ensuring consistent and efficient task execution.

Benefits of technology

The system reduces time spent on cumbersome schedule management and task prioritization, enabling efficient task execution and smooth communication across different cultures and language regions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026101252000001_ABST
    Figure 2026101252000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A means for converting audio data received through an audio input device into text information, A means for analyzing the aforementioned text information, extracting the user's schedule information, and updating it on a recording medium, Means for synchronizing the updated schedule information to multiple terminal devices via a communication network, A means for re-evaluating the priority of a user's tasks using external information acquired by a data analysis device, A means of analyzing market trends and environmental data based on past data to propose optimal actions for the future, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In recent years, businesspersons are required to efficiently manage tasks and improve productivity in various operations and information. However, there are limitations in manual schedule management and information analysis, and mistakes and inefficiencies are likely to occur. Also, in global operations, the need for multilingual support is increasing, and a system that supports optimal management of priorities and quick decision-making is required. Therefore, there is a need to develop a next-generation secretary system that can provide support for efficient schedule management based on voice recognition and data analysis.

Means for Solving the Problems

[0005] This invention provides means for rapidly converting voice data received via a voice input device into text information. Furthermore, it provides means for analyzing the converted text information to accurately extract the user's schedule information and update it on a recording medium in real time. In addition, by immediately synchronizing the updated schedule information to multiple terminal devices via a communication network, it ensures that consistent information is available on all devices. Moreover, it re-evaluates the priority of the user's tasks using external information acquired by a data analysis device, thereby realizing optimized schedule management. As a result, efficient business support, including multilingual support, becomes possible in a global business environment.

[0006] A "voice input device" is a device that receives a user's voice and converts that voice data into a digital signal for processing.

[0007] "Textual information" refers to information obtained by converting audio data received from a voice input device into text format.

[0008] "Schedule information" refers to detailed information such as date, time, content, and location necessary for managing a user's schedule.

[0009] A "recording medium" is a physical or digital storage device used to store data or information.

[0010] A "communication network" is a general term for a network used to transmit information and data between different devices.

[0011] A "terminal device" refers to a device such as a computer or mobile device that sends and receives information through a communication network.

[0012] A "data analysis device" is a hardware or software system that collects various types of data, analyzes that data, and generates useful information.

[0013] "External information" refers to market trends and other environmental data that influence users' work and decision-making.

[0014] "Task prioritization" refers to assigning a sequence of tasks and activities based on their importance and urgency, according to the objective. [Brief explanation of the drawing]

[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13]It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when combined with an emotion engine. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when combined with an emotion engine.

Mode for Carrying Out the Invention

[0016] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described according to the attached drawings.

[0017] First, the terms used in the following description will be explained.

[0018] In the following embodiments, a processor with a reference number (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0019] In the following embodiments, a RAM (Random Access Memory) with a reference number is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0020] In the following embodiments, a storage with a reference number is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0023] [First Embodiment]

[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0036] The system based on this invention provides users with efficient and personalized work support through the coordinated operation of various functions, primarily speech recognition, schedule management, data analysis, priority optimization, and multilingual support. Details of each function and their implementation examples are described below.

[0037] A terminal equipped with a voice input device captures voice instructions from the user and processes them as digital voice data. The terminal sends this voice data to a server. The server uses voice recognition software to convert the voice data into text information. Based on this text information, the server analyzes the user's instructions and recognizes them as schedule management tasks.

[0038] The server accesses the user's schedule database to check for and update new appointments at the specified time. This updated schedule information is instantly synchronized across multiple devices via the cloud. This ensures consistency in work processes, as users can access the same schedule information across multiple devices.

[0039] Furthermore, the server uses data analysis devices to acquire market trends and historical business data from external sources, and re-evaluates the priority of the user's tasks. Based on these analysis results, the user is presented with the optimal task order.

[0040] To enable multilingual support, the server translates received instructions, even if they are given in different languages, into a language it can internally parse. This allows users to smoothly utilize the system even in an international business environment.

[0041] For example, if a user uses voice input to say, "Add a project meeting tomorrow at 3 PM," the device captures this instruction and sends it to the server. The server analyzes this and updates the schedule, which is then reflected in real time across all of the user's devices. Furthermore, the server reassesssing the importance of meetings based on external data and current market trends, enabling users to manage tasks efficiently.

[0042] This system reduces the time users spend on cumbersome schedule management and task prioritization, allowing them to focus on important tasks. Furthermore, its multilingual support facilitates smooth communication across different cultures and language regions.

[0043] The following describes the processing flow.

[0044] Step 1:

[0045] The user gives instructions via a voice input device. The voice input device receives this voice as digital data and transmits it to the terminal.

[0046] Step 2:

[0047] The terminal sends the received audio data to the server. The data is sent using an appropriate communication protocol.

[0048] Step 3:

[0049] The server receives the audio data and uses speech recognition software to convert the audio into text format. In this process, the audio data is converted into analyzable text information.

[0050] Step 4:

[0051] The server analyzes the converted character information. If it recognizes that the instructions are related to schedule management, it accesses the relevant schedule database to check for the existence of the specified appointment.

[0052] Step 5:

[0053] The server checks for available time slots and adds the new appointment to the schedule. This update is instantly synchronized via the cloud to all other devices the user uses.

[0054] Step 6:

[0055] The server uses data analysis equipment to re-evaluate the priority of user tasks based on market trends and business data acquired from external sources.

[0056] Step 7:

[0057] The server sends the re-evaluation results to the device and presents the user with a schedule based on the latest priorities. It also notifies the user of updates as needed by sending notifications to the device.

[0058] Step 8:

[0059] Users can check updated schedule information and task priorities through their devices and adjust their work plans as needed.

[0060] (Example 1)

[0061] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0062] In today's business environment, users are required to efficiently manage a wide variety of tasks and appropriately adjust their schedules. However, dealing with different language environments and large amounts of information can be burdensome for users. While voice-based scheduling and multilingual automation are advancing, there is a need to pursue even greater accuracy and efficiency.

[0063] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0064] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information and extracting the user's schedule information to update it on a recording medium, and means for synchronizing the updated schedule information to multiple terminal devices via a communication network. This enables the user to manage their schedule using voice commands.

[0065] A "voice input device" is a device that receives voice commands from a user and has the function of converting them into digital voice data.

[0066] "Means of converting into text information" refers to technology that analyzes audio data and digitizes it as corresponding text information.

[0067] "A means of analyzing, extracting user schedule information, and updating it on a recording medium" refers to the process of identifying the user's instructions based on textual information and recording this information in a database to update the schedule information.

[0068] "Means of synchronizing to multiple terminal devices via a communication network" refers to technology that synchronizes updated information to multiple devices in real time via a network.

[0069] A "data analysis device" is a device that analyzes data acquired from an external source and extracts useful information.

[0070] "Methods for re-evaluating task priorities" refers to a process that re-evaluates the importance of a user's tasks based on acquired external information and determines their order.

[0071] "Means of implementing noise cancellation" refers to technologies used to reduce ambient noise during audio recording and obtain clear audio data.

[0072] "A means of performing conflict checks and updates" refers to a technique that verifies that newly added appointments do not overlap with existing appointments and updates the database as necessary.

[0073] The "function to translate instructions in different languages ​​and convert them into a parseable language" is a technology aimed at accurately analyzing information provided in different languages ​​by converting it into a language that can be processed internally, even in a multilingual environment.

[0074] The system based on this invention utilizes advanced speech recognition technology, schedule management functions, priority optimization, and multilingual support to provide users with personalized work assistance.

[0075] First, the terminal captures voice commands from the user using a voice input device. During this process, noise cancellation technology is implemented to effectively eliminate ambient noise. This voice data is processed digitally by the terminal and transmitted to the server.

[0076] The server converts voice data into text using speech recognition software. Specifically, it uses a general cloud-based speech recognition service, for example, applying "automation software provided by a voice input device manufacturer." This text information is then analyzed to extract the user's instructions, and based on this, schedule management tasks are generated.

[0077] Subsequently, the server adds the updated schedule to the database, which serves as the recording medium, and this information is instantly synchronized to multiple user terminals via the communication network. The schedule information update also includes data conflict checks to prevent inconsistencies with existing schedules.

[0078] Furthermore, data analysis equipment is used to analyze information acquired from external sources and users' work history, dynamically optimizing task priorities. To this end, for example, a "data analysis platform" is utilized to generate insights that maximize operational efficiency.

[0079] Furthermore, the system features multilingual support, automatically translating instructions received in different languages ​​into a language that can be analyzed. This feature allows users to seamlessly utilize the system even in international business environments.

[0080] For example, if a user voice-inputs "Add a project meeting tomorrow at 3pm," the device captures this instruction and sends it to the server. The server analyzes this and updates the schedule, which is then reflected in real time across all of the user's devices. This system allows users to reduce the time they spend on complex schedule management and task prioritization, enabling them to focus on important work.

[0081] An example of a prompt in a generative AI model is: "Please describe a scenario for a system where the user gives voice instructions, the system updates the schedule based on those instructions, and performs optimal task management."

[0082] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0083] Step 1:

[0084] The device captures voice commands from the user via a voice input device. The input is the user's voice commands, and the output is digital audio data. After acquiring this data, the device performs noise cancellation processing to improve the quality of the audio data.

[0085] Step 2:

[0086] The terminal transmits processed digital audio data to the server. The input is digital audio data with noise removed, and the output is data sent to the server via synchronized network communication. This transmission is carried out using encryption technology to ensure data security.

[0087] Step 3:

[0088] The server analyzes the received audio data using speech recognition software and converts it into text information. The input is digital audio data, and the output is the corresponding text information. Here, the generated text information is temporarily recorded in a database as intermediate data.

[0089] Step 4:

[0090] The server analyzes textual information and extracts user instructions. The input is textual information, and the output is a dataset for schedule updates. During the analysis process, keyword matching and natural language processing techniques are used to extract the instructions.

[0091] Step 5:

[0092] The server updates the user's schedule database based on the instructions. The input is a dataset for schedule updates, and the output is the updated schedule information. During this process, the system compares the updated data with existing data to check for duplicates or conflicts before recording the information.

[0093] Step 6:

[0094] The server synchronizes updated schedule information to each terminal via the communication network. The input is the updated schedule information, and the output is the sharing of the same schedule information to the group of terminals that receive it. After sharing, users can access that information from any terminal.

[0095] Step 7:

[0096] The server uses a data analysis device to re-evaluate the priority of user tasks based on external information. Inputs are information obtained from external sources and user schedule data, and output is optimized task sequence information. This process compares the data with the latest market data to perform analysis that maximizes operational efficiency.

[0097] Step 8:

[0098] The server uses its multilingual capabilities to translate instructions in different languages ​​into a parseable language as needed. The input is the text information that needs translation, and the output is the translated text information that can be processed internally. This process is designed to facilitate system use in international business environments.

[0099] (Application Example 1)

[0100] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0101] In modern society, individuals need to manage numerous schedules and tasks, and there is a demand for efficient organization of these. Furthermore, there is a need for systems that can flexibly adapt to multilingual environments and fluctuating market trends, and present optimal action plans tailored to individual needs. Additionally, there is a need for methods that simplify schedule management within the home, enabling all family members to act efficiently.

[0102] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0103] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information, extracting the user's schedule information, and updating it on a recording medium, and means for analyzing market trends and environmental data based on past data and proposing the optimal future actions. This makes it possible to effectively manage schedules with simple voice operations and to flexibly propose the optimal actions in response to multilingual environments and market fluctuations.

[0104] A "voice input device" is a device that converts external voice information into electrical signals, making it possible to analyze it as digital data.

[0105] "Means of converting into text information" refers to the process of processing audio data received from an audio input device and converting it into digital data in text format.

[0106] "Means for extracting schedule information and updating it on a recording medium" refers to a function that identifies schedule-related data from analyzed text information and saves it to a recording device as new or modified data.

[0107] "Means for synchronizing with multiple terminal devices" refers to the process of making information updated on a recording medium identical to that of multiple other electronic devices via a network without delay.

[0108] A "data analysis device" is a device that possesses the technology to analyze various types of information obtained from external sources and to reveal regularities and trends.

[0109] "A means of re-evaluating task priorities" refers to a function that allows users to review tasks relevant to them based on their importance and urgency, and rearrange them in an appropriate order.

[0110] "A means of analyzing market trends and environmental data to propose optimal actions for the future" refers to a process of analyzing past data and changes in the external environment to derive the optimal actions that users should take.

[0111] This system is activated when the user speaks into a voice input device. The voice input device collects the user's voice instructions and sends them to a server in real time as digital voice data. This voice data is transmitted to the server via the internet. There, the server converts the voice into text information using speech recognition software such as Google® Speech-to-Text API. After obtaining the text information, the server analyzes this data and extracts new tasks based on the user's schedule.

[0112] New schedule information is updated to the user's storage medium via a cloud-based scheduling management system such as the Google Calendar API, instantly synchronizing with all other devices the user owns. This synchronization feature allows users to consistently access the latest schedule information from any device.

[0113] Furthermore, the server utilizes data analysis libraries such as Pandas and NumPy to analyze market trends and historical environmental data obtained from external sources. This analysis allows the server to re-evaluate the optimal priorities for the user's tasks and provide suggestions for future actions. For example, it can make specific suggestions such as recommending a picnic date based on weather trends analyzed from historical data.

[0114] To support multilingual environments, the server uses the Google Translate API and other tools to translate voice commands in different languages ​​into a language that can be analyzed, enabling smooth operation.

[0115] This system allows users to effectively manage their schedules with minimal voice commands and offers flexible, optimal action suggestions in diverse environments, enabling efficient work and personal management. As a result, the user's entire life runs smoothly and efficiently.

[0116] An example of a prompt when using a generative AI model might be a command such as, "How can we design a home robot assistant that can manage the schedules of all family members simultaneously and suggest the best course of action based on those schedules?"

[0117] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0118] Step 1:

[0119] The terminal captures the user's voice commands via a voice input device. The input is the user's verbal instructions, and the output is digital voice data. This digital voice data is sent from the terminal to the server.

[0120] Step 2:

[0121] The server converts the received digital audio data into text information using speech recognition software (e.g., Google Speech-to-Text API). The input here is digital audio data, and the output is parseable text information. In this process, the server executes a speech recognition algorithm, converting phonemes into text.

[0122] Step 3:

[0123] The server analyzes textual information and extracts specific schedule-related data. The input is textual information, and the output is a set of extracted schedule information. The server uses natural language processing techniques to identify schedule information such as dates and times from this text data.

[0124] Step 4:

[0125] The server extracts schedule information, updates the storage medium using a cloud-based schedule management system (e.g., Google Calendar API), and synchronizes it across multiple devices. The input is the extracted schedule information, and the output is a list of other devices to which the updated schedule has been synchronized. The server calls the API to update the schedule and immediately synchronizes it between devices.

[0126] Step 5:

[0127] The server uses data analysis libraries (e.g., Pandas, NumPy) to analyze externally obtained market trend and environmental data, and re-evaluates task priorities. Inputs are schedule information and external data, and output is a task list with re-evaluated priorities. Based on the analysis results, the server reassessss the importance of each task.

[0128] Step 6:

[0129] The server provides users with a re-evaluated task list and suggestions. The input is a task list with re-evaluated priorities, and the output is optimal action suggestions presented to the user. The server uses a notification function to send recommendations to the user.

[0130] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0131] This invention combines an emotion engine with a next-generation system that analyzes voice commands to streamline task management, enabling work support that takes user emotions into consideration. The system configuration and specific examples are described below.

[0132] This system consists primarily of a terminal equipped with a voice input device and a server that processes the data. When a user gives a voice command using the terminal, the terminal converts the voice into digital data and sends it to the server. The server uses speech recognition software to convert the voice into text information and analyzes the schedule requested by the user.

[0133] What's interesting is that the server has an emotion engine that can identify the user's emotions from voice data. Specifically, it estimates the emotional state from the tone, speed, and word choice of the voice, and stores it on a recording medium. Based on this emotional information, the server dynamically adjusts task priorities to help users perform their work in a stress-free environment.

[0134] For example, if a user expresses an emotion indicating busyness during their speech, the server will identify that emotion and suggest postponing currently scheduled low-priority tasks. The emotion engine also monitors the user's stress level in real time, and if stress levels are high, it will display a message on the device suggesting a relaxation break.

[0135] This system enables emotion-based work support, allowing users to perform tasks efficiently in a way that suits their psychological state. Therefore, it goes beyond simply streamlining administrative tasks; it reduces the user's mental burden, making it a valuable system not only for general office work but also for various business environments. Furthermore, it can appropriately translate and analyze emotional expressions in different languages, providing support for understanding the nuances of the world even in international business environments.

[0136] The following describes the processing flow.

[0137] Step 1:

[0138] The user issues work instructions by voice via a terminal. The voice input device then captures the user's voice as digital audio data.

[0139] Step 2:

[0140] The terminal transmits the collected audio data to the server using a secure communication protocol. This data is then pre-processed for audio analysis.

[0141] Step 3:

[0142] The server uses a speech recognition engine to analyze the received audio data and convert it into text information. This text information is further analyzed and converted into structured data such as schedule management data.

[0143] Step 4:

[0144] The server activates an emotion engine to identify the user's emotions from the voice data. This process involves analyzing voice tone, speaking speed, and keywords used to estimate the emotional state.

[0145] Step 5:

[0146] The server uses the identified emotion data to re-evaluate the user's schedule and task priorities. For example, if an emotion indicating stress is detected, the server will replan by postponing less important tasks.

[0147] Step 6:

[0148] The server sends sentiment assessments and revised schedule information to the terminal, notifying the user of the latest recommended tasks and the reasons behind them. This also includes suggestions for relaxing breaks as needed.

[0149] Step 7:

[0150] Users review the information displayed on their device and choose whether to accept the new schedule or task priority. This feedback is collected as data and used for future sentiment evaluations.

[0151] Step 8:

[0152] The device updates its schedule based on user selection and continuously synchronizes information across all related devices, ensuring that users can access the latest information from any device.

[0153] (Example 2)

[0154] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0155] In recent years, while there has been a growing demand for increased operational efficiency, the importance of task management that also considers the psychological state of users has been increasing. However, conventional task management systems have been unable to take into account users' emotional information, resulting in insufficient support in reducing stress and anxiety. To solve this problem, there is a need for a system that can appropriately analyze users' emotions from voice data and dynamically adjust task priorities based on that analysis.

[0156] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0157] In this invention, the server includes means for converting acoustic data received through a voice input device into text information, means for identifying emotional information from the voice data and dynamically adjusting task priorities using an emotion engine, and means for translating instructions in different languages ​​into a parseable language and appropriately processing emotional information. This enables work support based on the user's emotions, reducing stress and improving work efficiency.

[0158] A "voice input device" is a device that converts sound into electrical signals, making them processable as digital data.

[0159] "Audio data" refers to digital data that includes sound, and is a data format that can be converted into text information using speech recognition technology.

[0160] "Textual information" refers to data in text format obtained by digitizing audio data.

[0161] "Recording medium" refers to any device or medium that stores digital information and can transmit or read it as needed.

[0162] An "emotion engine" refers to a technology that identifies a user's emotional state by analyzing voice data and uses the results to perform dynamic system operations.

[0163] "Communication network" refers to the entire network infrastructure used to send and receive data between multiple devices.

[0164] "Dynamic adjustment" refers to a system that adapts processes and settings in real time in response to user data and circumstances, in order to maintain an optimal state.

[0165] "External information" refers to all types of data obtained from outside the system, including weather information, traffic conditions, and other relevant schedule data.

[0166] Embodiments for carrying out this invention are described below.

[0167] This system primarily consists of terminals equipped with voice input devices and servers, which are computers that process data. Users give voice commands using the terminals. The terminals convert the voice received through the voice input devices into digital data. For example, if a smart device is used as the terminal, it uses its built-in microphone to acquire voice and converts it into data using its voice-to-digital conversion function.

[0168] The server uses speech recognition software to convert this speech data into text. This process can utilize, for example, speech recognition APIs from major technology companies. The computer analyzes the user's instructions from the resulting text and extracts schedule and task information.

[0169] Even more interesting is the inclusion of a server-based emotion engine that analyzes the user's emotions based on voice data. This estimates the emotional state based on the tone, speed, and word choice of the voice, and this information is stored on a recording medium. This emotional information is used by the server to dynamically re-evaluate task priorities. For example, if a user gives a voice command that expresses stress, the emotion engine can identify that emotion and suggest postponing low-priority tasks.

[0170] Furthermore, it utilizes translation functions to convert instructions in different languages ​​into an analyzable language, and also appropriately analyzes emotional states. This makes it possible to support the user's work efficiency even in a multilingual environment.

[0171] A concrete example of a prompt is, "Analyze the user's emotions based on their tone and content, and suggest appropriate changes to the task schedule." By inputting this prompt into the AI ​​model, appropriate responses and suggestions can be obtained. In this way, the system provides emotion-based work support, reducing the user's psychological burden.

[0172] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0173] Step 1:

[0174] The user inputs voice commands using the terminal. The terminal acquires voice data via a voice input device and converts this analog audio into a digital format. Specifically, the terminal's microphone captures the audio, and the built-in DSP (Digital Signal Processing Unit) converts the audio waveform into digital data. The output obtained as a result of this processing is digital audio data.

[0175] Step 2:

[0176] The terminal transfers the converted digital audio data to the server. HTTPS, a secure communication protocol over the internet, is used for data transfer. The digital audio data generated earlier is used as input, and the data reaches the server as output.

[0177] Step 3:

[0178] The server converts received digital audio data into text information using speech recognition software. For example, the server uses AI to analyze speech patterns from the audio data and generate corresponding text. This process utilizes API services to achieve highly accurate speech recognition. The input is digital audio data, and the output is text information.

[0179] Step 4:

[0180] The server analyzes the converted text information and extracts task and schedule information requested by the user. The server classifies the text data by content and updates the schedule and task list. The input is the text information to be analyzed, and the output is the updated schedule information.

[0181] Step 5:

[0182] The server uses an emotion engine to identify the user's emotions based on the voice data. It analyzes the tone and speed of the voice to estimate the user's emotional state. This process employs sophisticated algorithms, and the emotion information is stored on a recording medium. The input is the data after speech recognition is complete, and the output is the identified emotion information.

[0183] Step 6:

[0184] The server dynamically adjusts task priorities based on emotional information. For example, the server checks emotional information and, if the user is in a high-stress state, generates a suggestion to postpone low-priority tasks. The input is emotional information and current task information, and the output is a task list with updated priorities.

[0185] Step 7:

[0186] The device receives feedback from the server and displays suggested schedule adjustments and task priorities to the user. For example, the device displays the message "We recommend taking a break" on the screen. The input is feedback information from the server, and the output is visual feedback to the user.

[0187] (Application Example 2)

[0188] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0189] Providing efficient task management and emotional support simultaneously remains a challenging task in modern homes and workplaces. Especially given the stress often experienced in busy daily lives, proper prioritization and emotional support are essential. Furthermore, consistent support across diverse language environments is crucial.

[0190] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0191] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information to extract the user's schedule information and update it on a recording medium, means for synchronizing the updated schedule information to multiple terminal devices via a communication network, means for re-evaluating the priority of the user's tasks using external information acquired by a data analysis device, and means for estimating the emotional state from the user's voice data and dynamically adjusting work support based on the re-evaluated priority. This reduces the psychological burden on the user and enables efficient work execution.

[0192] A "voice input device" is a device that converts voice data into a digital format and transmits it to a processing unit.

[0193] "Textual information" refers to information that has been digitally converted from audio data and expressed in text format.

[0194] A "recording medium" is a physical or digital storage device that stores data, making it available for later use or analysis.

[0195] A "communication network" is a network infrastructure for transmitting data between multiple terminal devices.

[0196] A "data analysis device" is a computer system that processes diverse information and provides analysis results.

[0197] "External information" refers to supplementary data obtained from outside the user's environment and is used to adjust tasks and schedules.

[0198] "Priority" is an indicator used to determine the order in which to perform multiple tasks or duties.

[0199] "Emotional state" refers to the user's psychological or emotional state estimated based on voice analysis and other data.

[0200] "Business support" refers to assistance and systems designed to help people perform tasks and work efficiently.

[0201] "Dynamic adjustment" means changing the order and content of tasks in real time according to the situation in order to optimize them.

[0202] This invention utilizes a terminal equipped with a voice input device to allow the user to input voice data, which is then digitized as text information. This digitized text information is transmitted to a server via a communication network. The server analyzes the received text information, extracts the user's schedule information, and stores it on a recording medium. During this process, a data analysis device is used to acquire external information and re-evaluate the user's task priorities. The re-evaluated priorities are dynamically adjusted based on the emotional state contained in the user's voice data.

[0203] The server provides work support to the user based on dynamically adjusted task priorities. As part of this work support, if the user's emotional state indicates high stress, the server may generate a message suggesting a break. For example, if the user voice-inputs, "I need to prepare for a meeting after lunch," the server, if it determines the user is stressed, will suggest, "How about taking a short break before preparing for the meeting?"

[0204] This system can also translate instructions in various languages ​​and provide appropriate work support based on the user's emotional state. For speech recognition, it uses, for example, the Google Speech-to-Text API, and for sentiment analysis, it uses IBM Watson® Tone Analyzer. The Asana API can be used to manage the user's schedule and tasks.

[0205] When using a generative AI model to generate prompt statements, the following prompts are used:

[0206] "When the system analyzes that the user is experiencing stress, please generate a message suggesting they take a break."

[0207] Through this prompt, the AI ​​can create optimal suggestions for the user and help alleviate stress.

[0208] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0209] Step 1:

[0210] The device receives voice input from the user. This voice data is captured via the built-in microphone and converted into a digital format that can be sent to the server. The Google Speech-to-Text API is used to convert the voice input into text data. This process extracts text information from the voice.

[0211] Step 2:

[0212] The server analyzes the received text information and extracts the user's schedule and tasks. Natural Language Processing (NLP) technology is used for the analysis. It receives text-based audio data as input, organizes schedule information and important tasks based on that data, and creates output data to update the storage medium.

[0213] Step 3:

[0214] The server evaluates the tone of voice data and text to perform sentiment analysis and estimate the user's emotional state. IBM Watson Tone Analyzer is used for this process. The input is voice or text data, and the output is an indicator of the emotional state.

[0215] Step 4:

[0216] The server uses data analysis equipment to collect external information (weather information, news, traffic conditions, etc.) and re-evaluate the user's task priorities. It also takes the user's emotional state into consideration, dynamically adjusting task priorities. Inputs include external information, the user's emotional state, and task information, while output is the updated task priorities.

[0217] Step 5:

[0218] Based on the user's emotional state and task priorities, the server generates messages and suggestions to support work. During periods of high stress, it supports work efficiency through suggestions such as taking breaks. Using a generation AI model, it outputs the most suitable suggestions to the user based on appropriate prompt sentences. In this step, emotional state and task information are used as input, and suggestion messages are obtained as output.

[0219] Step 6:

[0220] The terminal notifies the user of suggested messages and schedule information received from the server. Information is conveyed visually or audibly using a display device. This helps reduce stress and facilitates efficient task management. Input is the instruction output from the server.

[0221] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0222] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0223] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0224] [Second Embodiment]

[0225] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0226] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0227] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0228] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0229] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0230] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0231] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0232] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0233] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0234] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0235] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0236] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0237] The system based on this invention provides users with efficient and personalized work support through the coordinated operation of various functions, primarily speech recognition, schedule management, data analysis, priority optimization, and multilingual support. Details of each function and their implementation examples are described below.

[0238] A terminal equipped with a voice input device captures voice instructions from the user and processes them as digital voice data. The terminal sends this voice data to a server. The server uses voice recognition software to convert the voice data into text information. Based on this text information, the server analyzes the user's instructions and recognizes them as schedule management tasks.

[0239] The server accesses the user's schedule database to check for and update new appointments at the specified time. This updated schedule information is instantly synchronized across multiple devices via the cloud. This ensures consistency in work processes, as users can access the same schedule information across multiple devices.

[0240] Furthermore, the server uses data analysis devices to acquire market trends and historical business data from external sources, and re-evaluates the priority of the user's tasks. Based on these analysis results, the user is presented with the optimal task order.

[0241] To enable multilingual support, the server translates received instructions, even if they are given in different languages, into a language it can internally parse. This allows users to smoothly utilize the system even in an international business environment.

[0242] For example, if a user uses voice input to say, "Add a project meeting tomorrow at 3 PM," the device captures this instruction and sends it to the server. The server analyzes this and updates the schedule, which is then reflected in real time across all of the user's devices. Furthermore, the server reassesssing the importance of meetings based on external data and current market trends, enabling users to manage tasks efficiently.

[0243] This system reduces the time users spend on cumbersome schedule management and task prioritization, allowing them to focus on important tasks. Furthermore, its multilingual support facilitates smooth communication across different cultures and language regions.

[0244] The following describes the processing flow.

[0245] Step 1:

[0246] The user gives instructions via a voice input device. The voice input device receives this voice as digital data and transmits it to the terminal.

[0247] Step 2:

[0248] The terminal sends the received audio data to the server. The data is sent using an appropriate communication protocol.

[0249] Step 3:

[0250] The server receives the audio data and uses speech recognition software to convert the audio into text format. In this process, the audio data is converted into analyzable text information.

[0251] Step 4:

[0252] The server analyzes the converted character information. If it recognizes that the instructions are related to schedule management, it accesses the relevant schedule database to check for the existence of the specified appointment.

[0253] Step 5:

[0254] The server checks for available time slots and adds the new appointment to the schedule. This update is instantly synchronized via the cloud to all other devices the user uses.

[0255] Step 6:

[0256] The server uses data analysis equipment to re-evaluate the priority of user tasks based on market trends and business data acquired from external sources.

[0257] Step 7:

[0258] The server sends the re-evaluation results to the device and presents the user with a schedule based on the latest priorities. It also notifies the user of updates as needed by sending notifications to the device.

[0259] Step 8:

[0260] Users can check updated schedule information and task priorities through their devices and adjust their work plans as needed.

[0261] (Example 1)

[0262] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0263] In today's business environment, users are required to efficiently manage a wide variety of tasks and appropriately adjust their schedules. However, dealing with different language environments and large amounts of information can be burdensome for users. While voice-based scheduling and multilingual automation are advancing, there is a need to pursue even greater accuracy and efficiency.

[0264] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0265] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information and extracting the user's schedule information to update it on a recording medium, and means for synchronizing the updated schedule information to multiple terminal devices via a communication network. This enables the user to manage their schedule using voice commands.

[0266] A "voice input device" is a device that receives voice commands from a user and has the function of converting them into digital voice data.

[0267] "Means of converting into text information" refers to technology that analyzes audio data and digitizes it as corresponding text information.

[0268] "A means of analyzing, extracting user schedule information, and updating it on a recording medium" refers to the process of identifying the user's instructions based on textual information and recording this information in a database to update the schedule information.

[0269] "Means of synchronizing to multiple terminal devices via a communication network" refers to technology that synchronizes updated information to multiple devices in real time via a network.

[0270] A "data analysis device" is a device that analyzes data acquired from an external source and extracts useful information.

[0271] "Methods for re-evaluating task priorities" refers to a process that re-evaluates the importance of a user's tasks based on acquired external information and determines their order.

[0272] "Means of implementing noise cancellation" refers to technologies used to reduce ambient noise during audio recording and obtain clear audio data.

[0273] "A means of performing conflict checks and updates" refers to a technique that verifies that newly added appointments do not overlap with existing appointments and updates the database as necessary.

[0274] The "function to translate instructions in different languages ​​and convert them into a parseable language" is a technology aimed at accurately analyzing information provided in different languages ​​by converting it into a language that can be processed internally, even in a multilingual environment.

[0275] The system based on this invention utilizes advanced speech recognition technology, schedule management functions, priority optimization, and multilingual support to provide users with personalized work assistance.

[0276] First, the terminal captures voice commands from the user using a voice input device. During this process, noise cancellation technology is implemented to effectively eliminate ambient noise. This voice data is processed digitally by the terminal and transmitted to the server.

[0277] The server converts voice data into text using speech recognition software. Specifically, it uses a general cloud-based speech recognition service, for example, applying "automation software provided by a voice input device manufacturer." This text information is then analyzed to extract the user's instructions, and based on this, schedule management tasks are generated.

[0278] Subsequently, the server adds the updated schedule to the database, which serves as the recording medium, and this information is instantly synchronized to multiple user terminals via the communication network. The schedule information update also includes data conflict checks to prevent inconsistencies with existing schedules.

[0279] Furthermore, data analysis equipment is used to analyze information acquired from external sources and users' work history, dynamically optimizing task priorities. To this end, for example, a "data analysis platform" is utilized to generate insights that maximize operational efficiency.

[0280] Furthermore, the system features multilingual support, automatically translating instructions received in different languages ​​into a language that can be analyzed. This feature allows users to seamlessly utilize the system even in international business environments.

[0281] For example, if a user voice-inputs "Add a project meeting tomorrow at 3pm," the device captures this instruction and sends it to the server. The server analyzes this and updates the schedule, which is then reflected in real time across all of the user's devices. This system allows users to reduce the time they spend on complex schedule management and task prioritization, enabling them to focus on important work.

[0282] Examples of prompt sentences in the utilization of the generated AI model include "Please describe a scenario of a system where that enables updating a schedule and performing optimal task management based on the instructions given by the user in voice."

[0283] The flow of the specific process in Example 1 will be described using FIG. 11.

[0284] Step 1:

[0285] The terminal captures the voice instruction from the user via the voice input device. The input is the voice instruction of the user, and the output is digital voice data. After the terminal obtains this data, it performs noise cancellation processing to improve the quality of the voice data.

[0286] Step 2:

[0287] The terminal transmits the processed digital voice data to the server. The input is the digital voice data with noise removed, and the output is the transfer data sent to the server via synchronized network communication. This transmission is carried out in a form that ensures the security of the data using encryption technology.

[0288] Step 3:

[0289] The server analyzes the received voice data using voice recognition software and converts it into character information. The input is digital voice data, and the output is the corresponding character information. Here, the generated character information is temporarily recorded in the database as intermediate data.

[0290] Step 4:

[0291] The server analyzes the character information and extracts the content of the user's instruction. The input is the character information, and the output is a dataset for schedule update. During the analysis process, the extraction of the instruction content is performed using keyword matching and natural language processing techniques.

[0292] Step 5:

[0293] The server updates the user's schedule database based on the instructions. The input is a dataset for schedule updates, and the output is the updated schedule information. During this process, the system compares the updated data with existing data to check for duplicates or conflicts before recording the information.

[0294] Step 6:

[0295] The server synchronizes updated schedule information to each terminal via the communication network. The input is the updated schedule information, and the output is the sharing of the same schedule information to the group of terminals that receive it. After sharing, users can access that information from any terminal.

[0296] Step 7:

[0297] The server uses a data analysis device to re-evaluate the priority of user tasks based on external information. Inputs are information obtained from external sources and user schedule data, and output is optimized task sequence information. This process compares the data with the latest market data to perform analysis that maximizes operational efficiency.

[0298] Step 8:

[0299] The server uses its multilingual capabilities to translate instructions in different languages ​​into a parseable language as needed. The input is the text information that needs translation, and the output is the translated text information that can be processed internally. This process is designed to facilitate system use in international business environments.

[0300] (Application Example 1)

[0301] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0302] In modern society, individuals need to manage many schedules and tasks, and are required to organize them efficiently. In addition, there is a need for a system that can flexibly respond to multilingual environments and changing market trends and present an optimal action plan according to individual needs. Furthermore, there is a need for a method that simplifies schedule management within the family and enables all family members to act efficiently.

[0303] The specific processing by the specific processing unit 290 of the data processing apparatus 12 in Application Example 1 is realized by the following means.

[0304] In this invention, the server includes means for converting voice data received through a voice input device into character information, means for analyzing the character information, extracting user schedule information, and updating a recording medium, and means for analyzing market trends and environmental data based on past data and proposing optimal future actions. Thereby, it becomes possible to effectively manage schedules with simple voice operations and flexibly propose optimal actions corresponding to multilingual environments and market changes.

[0305] The "voice input device" is a device that converts external voice information into an electrical signal and makes it analyzable as digital data.

[0306] The "means for converting into character information" is a process of processing voice data received from a voice input device and changing it into digital data in text format.

[0307] The "means for extracting schedule information and updating a recording medium" refers to a function of identifying schedule-related data from the analyzed character information and saving it in a recording device as new or modified.

[0308] The "means for synchronizing with a plurality of terminal devices" is a process of making the information updated in a recording medium in the same state without delay to other plurality of electronic devices through a network.

[0309] A "data analysis device" is a device that possesses the technology to analyze various types of information obtained from external sources and to reveal regularities and trends.

[0310] "A means of re-evaluating task priorities" refers to a function that allows users to review tasks relevant to them based on their importance and urgency, and rearrange them in an appropriate order.

[0311] "A means of analyzing market trends and environmental data to propose optimal actions for the future" refers to a process of analyzing past data and changes in the external environment to derive the optimal actions that users should take.

[0312] This system is activated when the user speaks into a voice input device. The voice input device collects the user's voice instructions and sends them to a server in real time as digital voice data. This voice data is sent to the server via the internet. There, the server converts the voice into text information using speech recognition software such as the Google Speech-to-Text API. After obtaining the text information, the server analyzes this data and extracts new tasks based on the user's schedule.

[0313] New schedule information is updated to the user's storage medium via a cloud-based scheduling management system such as the Google Calendar API, instantly synchronizing with all other devices the user owns. This synchronization feature allows users to consistently access the latest schedule information from any device.

[0314] Furthermore, the server utilizes data analysis libraries such as Pandas and NumPy to analyze market trends and historical environmental data obtained from external sources. This analysis allows the server to re-evaluate the optimal priorities for the user's tasks and provide suggestions for future actions. For example, it can make specific suggestions such as recommending a picnic date based on weather trends analyzed from historical data.

[0315] To support multilingual environments, the server uses the Google Translate API and other tools to translate voice commands in different languages ​​into a language that can be analyzed, enabling smooth operation.

[0316] This system allows users to effectively manage their schedules with minimal voice commands and offers flexible, optimal action suggestions in diverse environments, enabling efficient work and personal management. As a result, the user's entire life runs smoothly and efficiently.

[0317] An example of a prompt when using a generative AI model might be a command such as, "How can we design a home robot assistant that can manage the schedules of all family members simultaneously and suggest the best course of action based on those schedules?"

[0318] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0319] Step 1:

[0320] The terminal captures the user's voice commands via a voice input device. The input is the user's verbal instructions, and the output is digital voice data. This digital voice data is sent from the terminal to the server.

[0321] Step 2:

[0322] The server converts the received digital audio data into text information using speech recognition software (e.g., Google Speech-to-Text API). The input here is digital audio data, and the output is parseable text information. In this process, the server executes a speech recognition algorithm, converting phonemes into text.

[0323] Step 3:

[0324] The server analyzes textual information and extracts specific schedule-related data. The input is textual information, and the output is a set of extracted schedule information. The server uses natural language processing techniques to identify schedule information such as dates and times from this text data.

[0325] Step 4:

[0326] The server extracts schedule information, updates the storage medium using a cloud-based schedule management system (e.g., Google Calendar API), and synchronizes it across multiple devices. The input is the extracted schedule information, and the output is a list of other devices to which the updated schedule has been synchronized. The server calls the API to update the schedule and immediately synchronizes it between devices.

[0327] Step 5:

[0328] The server uses data analysis libraries (e.g., Pandas, NumPy) to analyze externally obtained market trend and environmental data, and re-evaluates task priorities. Inputs are schedule information and external data, and output is a task list with re-evaluated priorities. Based on the analysis results, the server reassessss the importance of each task.

[0329] Step 6:

[0330] The server provides users with a re-evaluated task list and suggestions. The input is a task list with re-evaluated priorities, and the output is optimal action suggestions presented to the user. The server uses a notification function to send recommendations to the user.

[0331] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0332] This invention combines an emotion engine with a next-generation system that analyzes voice commands to streamline task management, enabling work support that takes user emotions into consideration. The system configuration and specific examples are described below.

[0333] This system consists primarily of a terminal equipped with a voice input device and a server that processes the data. When a user gives a voice command using the terminal, the terminal converts the voice into digital data and sends it to the server. The server uses speech recognition software to convert the voice into text information and analyzes the schedule requested by the user.

[0334] What's interesting is that the server has an emotion engine that can identify the user's emotions from voice data. Specifically, it estimates the emotional state from the tone, speed, and word choice of the voice, and stores it on a recording medium. Based on this emotional information, the server dynamically adjusts task priorities to help users perform their work in a stress-free environment.

[0335] For example, if a user expresses an emotion indicating busyness during their speech, the server will identify that emotion and suggest postponing currently scheduled low-priority tasks. The emotion engine also monitors the user's stress level in real time, and if stress levels are high, it will display a message on the device suggesting a relaxation break.

[0336] This system enables emotion-based work support, allowing users to perform tasks efficiently in a way that suits their psychological state. Therefore, it goes beyond simply streamlining administrative tasks; it reduces the user's mental burden, making it a valuable system not only for general office work but also for various business environments. Furthermore, it can appropriately translate and analyze emotional expressions in different languages, providing support for understanding the nuances of the world even in international business environments.

[0337] The following describes the processing flow.

[0338] Step 1:

[0339] The user issues work instructions by voice via a terminal. The voice input device then captures the user's voice as digital audio data.

[0340] Step 2:

[0341] The terminal transmits the collected audio data to the server using a secure communication protocol. This data is then pre-processed for audio analysis.

[0342] Step 3:

[0343] The server uses a speech recognition engine to analyze the received audio data and convert it into text information. This text information is further analyzed and converted into structured data such as schedule management data.

[0344] Step 4:

[0345] The server activates an emotion engine to identify the user's emotions from the voice data. This process involves analyzing voice tone, speaking speed, and keywords used to estimate the emotional state.

[0346] Step 5:

[0347] The server uses the identified emotion data to re-evaluate the user's schedule and task priorities. For example, if an emotion indicating stress is detected, the server will replan by postponing less important tasks.

[0348] Step 6:

[0349] The server sends sentiment assessments and revised schedule information to the terminal, notifying the user of the latest recommended tasks and the reasons behind them. This also includes suggestions for relaxing breaks as needed.

[0350] Step 7:

[0351] Users review the information displayed on their device and choose whether to accept the new schedule or task priority. This feedback is collected as data and used for future sentiment evaluations.

[0352] Step 8:

[0353] The device updates its schedule based on user selection and continuously synchronizes information across all related devices, ensuring that users can access the latest information from any device.

[0354] (Example 2)

[0355] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0356] In recent years, while there has been a growing demand for increased operational efficiency, the importance of task management that also considers the psychological state of users has been increasing. However, conventional task management systems have been unable to take into account users' emotional information, resulting in insufficient support in reducing stress and anxiety. To solve this problem, there is a need for a system that can appropriately analyze users' emotions from voice data and dynamically adjust task priorities based on that analysis.

[0357] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0358] In this invention, the server includes means for converting acoustic data received through a voice input device into text information, means for identifying emotional information from the voice data and dynamically adjusting task priorities using an emotion engine, and means for translating instructions in different languages ​​into a parseable language and appropriately processing emotional information. This enables work support based on the user's emotions, reducing stress and improving work efficiency.

[0359] A "voice input device" is a device that converts sound into electrical signals, making them processable as digital data.

[0360] "Audio data" refers to digital data that includes sound, and is a data format that can be converted into text information using speech recognition technology.

[0361] "Textual information" refers to data in text format obtained by digitizing audio data.

[0362] "Recording medium" refers to any device or medium that stores digital information and can transmit or read it as needed.

[0363] An "emotion engine" refers to a technology that identifies a user's emotional state by analyzing voice data and uses the results to perform dynamic system operations.

[0364] "Communication network" refers to the entire network infrastructure used to send and receive data between multiple devices.

[0365] "Dynamic adjustment" refers to a system that adapts processes and settings in real time in response to user data and circumstances, in order to maintain an optimal state.

[0366] "External information" refers to all types of data obtained from outside the system, including weather information, traffic conditions, and other relevant schedule data.

[0367] Embodiments for carrying out this invention are described below.

[0368] This system primarily consists of terminals equipped with voice input devices and servers, which are computers that process data. Users give voice commands using the terminals. The terminals convert the voice received through the voice input devices into digital data. For example, if a smart device is used as the terminal, it uses its built-in microphone to acquire voice and converts it into data using its voice-to-digital conversion function.

[0369] The server uses speech recognition software to convert this speech data into text. This process can utilize, for example, speech recognition APIs from major technology companies. The computer analyzes the user's instructions from the resulting text and extracts schedule and task information.

[0370] Even more interesting is the inclusion of a server-based emotion engine that analyzes the user's emotions based on voice data. This estimates the emotional state based on the tone, speed, and word choice of the voice, and this information is stored on a recording medium. This emotional information is used by the server to dynamically re-evaluate task priorities. For example, if a user gives a voice command that expresses stress, the emotion engine can identify that emotion and suggest postponing low-priority tasks.

[0371] Furthermore, it utilizes translation functions to convert instructions in different languages ​​into an analyzable language, and also appropriately analyzes emotional states. This makes it possible to support the user's work efficiency even in a multilingual environment.

[0372] A concrete example of a prompt is, "Analyze the user's emotions based on their tone and content, and suggest appropriate changes to the task schedule." By inputting this prompt into the AI ​​model, appropriate responses and suggestions can be obtained. In this way, the system provides emotion-based work support, reducing the user's psychological burden.

[0373] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0374] Step 1:

[0375] The user inputs voice commands using the terminal. The terminal acquires voice data via a voice input device and converts this analog audio into a digital format. Specifically, the terminal's microphone captures the audio, and the built-in DSP (Digital Signal Processing Unit) converts the audio waveform into digital data. The output obtained as a result of this processing is digital audio data.

[0376] Step 2:

[0377] The terminal transfers the converted digital audio data to the server. HTTPS, a secure communication protocol over the internet, is used for data transfer. The digital audio data generated earlier is used as input, and the data reaches the server as output.

[0378] Step 3:

[0379] The server converts received digital audio data into text information using speech recognition software. For example, the server uses AI to analyze speech patterns from the audio data and generate corresponding text. This process utilizes API services to achieve highly accurate speech recognition. The input is digital audio data, and the output is text information.

[0380] Step 4:

[0381] The server analyzes the converted text information and extracts task and schedule information requested by the user. The server classifies the text data by content and updates the schedule and task list. The input is the text information to be analyzed, and the output is the updated schedule information.

[0382] Step 5:

[0383] The server uses an emotion engine to identify the user's emotions based on the voice data. It analyzes the tone and speed of the voice to estimate the user's emotional state. This process employs sophisticated algorithms, and the emotion information is stored on a recording medium. The input is the data after speech recognition is complete, and the output is the identified emotion information.

[0384] Step 6:

[0385] The server dynamically adjusts task priorities based on emotional information. For example, the server checks emotional information and, if the user is in a high-stress state, generates a suggestion to postpone low-priority tasks. The input is emotional information and current task information, and the output is a task list with updated priorities.

[0386] Step 7:

[0387] The device receives feedback from the server and displays suggested schedule adjustments and task priorities to the user. For example, the device displays the message "We recommend taking a break" on the screen. The input is feedback information from the server, and the output is visual feedback to the user.

[0388] (Application Example 2)

[0389] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".

[0390] Providing efficient task management and emotional support simultaneously remains a challenging task in modern homes and workplaces. Especially given the stress often experienced in busy daily lives, proper prioritization and emotional support are essential. Furthermore, consistent support across diverse language environments is crucial.

[0391] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0392] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information to extract the user's schedule information and update it on a recording medium, means for synchronizing the updated schedule information to multiple terminal devices via a communication network, means for re-evaluating the priority of the user's tasks using external information acquired by a data analysis device, and means for estimating the emotional state from the user's voice data and dynamically adjusting work support based on the re-evaluated priority. This reduces the psychological burden on the user and enables efficient work execution.

[0393] A "voice input device" is a device that converts voice data into a digital format and transmits it to a processing unit.

[0394] "Textual information" refers to information that has been digitally converted from audio data and expressed in text format.

[0395] A "recording medium" is a physical or digital storage device that stores data, making it available for later use or analysis.

[0396] A "communication network" is a network infrastructure for transmitting data between multiple terminal devices.

[0397] A "data analysis device" is a computer system that processes diverse information and provides analysis results.

[0398] "External information" refers to supplementary data obtained from outside the user's environment and is used to adjust tasks and schedules.

[0399] "Priority" is an indicator used to determine the order in which to perform multiple tasks or duties.

[0400] "Emotional state" refers to the user's psychological or emotional state estimated based on voice analysis and other data.

[0401] "Business support" refers to assistance and systems designed to help people perform tasks and work efficiently.

[0402] "Dynamic adjustment" means changing the order and content of tasks in real time according to the situation in order to optimize them.

[0403] This invention utilizes a terminal equipped with a voice input device to allow the user to input voice data, which is then digitized as text information. This digitized text information is transmitted to a server via a communication network. The server analyzes the received text information, extracts the user's schedule information, and stores it on a recording medium. During this process, a data analysis device is used to acquire external information and re-evaluate the user's task priorities. The re-evaluated priorities are dynamically adjusted based on the emotional state contained in the user's voice data.

[0404] The server provides work support to the user based on dynamically adjusted task priorities. As part of this work support, if the user's emotional state indicates high stress, the server may generate a message suggesting a break. For example, if the user voice-inputs, "I need to prepare for a meeting after lunch," the server, if it determines the user is stressed, will suggest, "How about taking a short break before preparing for the meeting?"

[0405] This system can also translate instructions in various languages ​​and provide appropriate work support based on the user's emotional state. For speech recognition, it uses, for example, the Google Speech-to-Text API, and for sentiment analysis, it uses IBM Watson Tone Analyzer. The Asana API can be used to manage the user's schedule and tasks.

[0406] When using a generative AI model to generate prompt statements, the following prompts are used:

[0407] "When the system analyzes that the user is experiencing stress, please generate a message suggesting they take a break."

[0408] Through this prompt, the AI ​​can create optimal suggestions for the user and help alleviate stress.

[0409] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0410] Step 1:

[0411] The device receives voice input from the user. This voice data is captured via the built-in microphone and converted into a digital format that can be sent to the server. The Google Speech-to-Text API is used to convert the voice input into text data. This process extracts text information from the voice.

[0412] Step 2:

[0413] The server analyzes the received text information and extracts the user's schedule and tasks. Natural Language Processing (NLP) technology is used for the analysis. It receives text-based audio data as input, organizes schedule information and important tasks based on that data, and creates output data to update the storage medium.

[0414] Step 3:

[0415] The server evaluates the tone of voice data and text to perform sentiment analysis and estimate the user's emotional state. IBM Watson Tone Analyzer is used for this process. The input is voice or text data, and the output is an indicator of the emotional state.

[0416] Step 4:

[0417] The server uses data analysis equipment to collect external information (weather information, news, traffic conditions, etc.) and re-evaluate the user's task priorities. It also takes the user's emotional state into consideration, dynamically adjusting task priorities. Inputs include external information, the user's emotional state, and task information, while output is the updated task priorities.

[0418] Step 5:

[0419] Based on the user's emotional state and task priorities, the server generates messages and suggestions to support work. During periods of high stress, it supports work efficiency through suggestions such as taking breaks. Using a generation AI model, it outputs the most suitable suggestions to the user based on appropriate prompt sentences. In this step, emotional state and task information are used as input, and suggestion messages are obtained as output.

[0420] Step 6:

[0421] The terminal notifies the user of suggested messages and schedule information received from the server. Information is conveyed visually or audibly using a display device. This helps reduce stress and facilitates efficient task management. Input is the instruction output from the server.

[0422] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0423] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0424] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0425] [Third Embodiment]

[0426] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0427] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0428] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0429] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0430] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0431] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0432] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0433] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0434] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0435] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0436] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0437] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0438] The system based on this invention provides users with efficient and personalized work support through the coordinated operation of various functions, primarily speech recognition, schedule management, data analysis, priority optimization, and multilingual support. Details of each function and their implementation examples are described below.

[0439] A terminal equipped with a voice input device captures voice instructions from the user and processes them as digital voice data. The terminal sends this voice data to a server. The server uses voice recognition software to convert the voice data into text information. Based on this text information, the server analyzes the user's instructions and recognizes them as schedule management tasks.

[0440] The server accesses the user's schedule database to check for and update new appointments at the specified time. This updated schedule information is instantly synchronized across multiple devices via the cloud. This ensures consistency in work processes, as users can access the same schedule information across multiple devices.

[0441] Furthermore, the server uses data analysis devices to acquire market trends and historical business data from external sources, and re-evaluates the priority of the user's tasks. Based on these analysis results, the user is presented with the optimal task order.

[0442] To enable multilingual support, the server translates received instructions, even if they are given in different languages, into a language it can internally parse. This allows users to smoothly utilize the system even in an international business environment.

[0443] For example, if a user uses voice input to say, "Add a project meeting tomorrow at 3 PM," the device captures this instruction and sends it to the server. The server analyzes this and updates the schedule, which is then reflected in real time across all of the user's devices. Furthermore, the server reassesssing the importance of meetings based on external data and current market trends, enabling users to manage tasks efficiently.

[0444] This system reduces the time users spend on cumbersome schedule management and task prioritization, allowing them to focus on important tasks. Furthermore, its multilingual support facilitates smooth communication across different cultures and language regions.

[0445] The following describes the processing flow.

[0446] Step 1:

[0447] The user gives instructions via a voice input device. The voice input device receives this voice as digital data and transmits it to the terminal.

[0448] Step 2:

[0449] The terminal sends the received audio data to the server. The data is sent using an appropriate communication protocol.

[0450] Step 3:

[0451] The server receives the audio data and uses speech recognition software to convert the audio into text format. In this process, the audio data is converted into analyzable text information.

[0452] Step 4:

[0453] The server analyzes the converted character information. If it recognizes that the instructions are related to schedule management, it accesses the relevant schedule database to check for the existence of the specified appointment.

[0454] Step 5:

[0455] The server checks for available time slots and adds the new appointment to the schedule. This update is instantly synchronized via the cloud to all other devices the user uses.

[0456] Step 6:

[0457] The server uses data analysis equipment to re-evaluate the priority of user tasks based on market trends and business data acquired from external sources.

[0458] Step 7:

[0459] The server sends the re-evaluation results to the device and presents the user with a schedule based on the latest priorities. It also notifies the user of updates as needed by sending notifications to the device.

[0460] Step 8:

[0461] Users can check updated schedule information and task priorities through their devices and adjust their work plans as needed.

[0462] (Example 1)

[0463] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0464] In today's business environment, users are required to efficiently manage a wide variety of tasks and appropriately adjust their schedules. However, dealing with different language environments and large amounts of information can be burdensome for users. While voice-based scheduling and multilingual automation are advancing, there is a need to pursue even greater accuracy and efficiency.

[0465] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0466] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information and extracting the user's schedule information to update it on a recording medium, and means for synchronizing the updated schedule information to multiple terminal devices via a communication network. This enables the user to manage their schedule using voice commands.

[0467] A "voice input device" is a device that receives voice commands from a user and has the function of converting them into digital voice data.

[0468] "Means of converting into text information" refers to technology that analyzes audio data and digitizes it as corresponding text information.

[0469] "A means of analyzing, extracting user schedule information, and updating it on a recording medium" refers to the process of identifying the user's instructions based on textual information and recording this information in a database to update the schedule information.

[0470] "Means of synchronizing to multiple terminal devices via a communication network" refers to technology that synchronizes updated information to multiple devices in real time via a network.

[0471] A "data analysis device" is a device that analyzes data acquired from an external source and extracts useful information.

[0472] "Methods for re-evaluating task priorities" refers to a process that re-evaluates the importance of a user's tasks based on acquired external information and determines their order.

[0473] "Means of implementing noise cancellation" refers to technologies used to reduce ambient noise during audio recording and obtain clear audio data.

[0474] "A means of performing conflict checks and updates" refers to a technique that verifies that newly added appointments do not overlap with existing appointments and updates the database as necessary.

[0475] The "function to translate instructions in different languages ​​and convert them into a parseable language" is a technology aimed at accurately analyzing information provided in different languages ​​by converting it into a language that can be processed internally, even in a multilingual environment.

[0476] The system based on this invention utilizes advanced speech recognition technology, schedule management functions, priority optimization, and multilingual support to provide users with personalized work assistance.

[0477] First, the terminal captures voice commands from the user using a voice input device. During this process, noise cancellation technology is implemented to effectively eliminate ambient noise. This voice data is processed digitally by the terminal and transmitted to the server.

[0478] The server converts voice data into text using speech recognition software. Specifically, it uses a general cloud-based speech recognition service, for example, applying "automation software provided by a voice input device manufacturer." This text information is then analyzed to extract the user's instructions, and based on this, schedule management tasks are generated.

[0479] Subsequently, the server adds the updated schedule to the database, which serves as the recording medium, and this information is instantly synchronized to multiple user terminals via the communication network. The schedule information update also includes data conflict checks to prevent inconsistencies with existing schedules.

[0480] Furthermore, data analysis equipment is used to analyze information acquired from external sources and users' work history, dynamically optimizing task priorities. To this end, for example, a "data analysis platform" is utilized to generate insights that maximize operational efficiency.

[0481] Furthermore, the system features multilingual support, automatically translating instructions received in different languages ​​into a language that can be analyzed. This feature allows users to seamlessly utilize the system even in international business environments.

[0482] For example, if a user voice-inputs "Add a project meeting tomorrow at 3pm," the device captures this instruction and sends it to the server. The server analyzes this and updates the schedule, which is then reflected in real time across all of the user's devices. This system allows users to reduce the time they spend on complex schedule management and task prioritization, enabling them to focus on important work.

[0483] An example of a prompt in a generative AI model is: "Please describe a scenario for a system where the user gives voice instructions, the system updates the schedule based on those instructions, and performs optimal task management."

[0484] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0485] Step 1:

[0486] The device captures voice commands from the user via a voice input device. The input is the user's voice commands, and the output is digital audio data. After acquiring this data, the device performs noise cancellation processing to improve the quality of the audio data.

[0487] Step 2:

[0488] The terminal transmits processed digital audio data to the server. The input is digital audio data with noise removed, and the output is data sent to the server via synchronized network communication. This transmission is carried out using encryption technology to ensure data security.

[0489] Step 3:

[0490] The server analyzes the received audio data using speech recognition software and converts it into text information. The input is digital audio data, and the output is the corresponding text information. Here, the generated text information is temporarily recorded in a database as intermediate data.

[0491] Step 4:

[0492] The server analyzes textual information and extracts user instructions. The input is textual information, and the output is a dataset for schedule updates. During the analysis process, keyword matching and natural language processing techniques are used to extract the instructions.

[0493] Step 5:

[0494] The server updates the user's schedule database based on the instructions. The input is a dataset for schedule updates, and the output is the updated schedule information. During this process, the system compares the updated data with existing data to check for duplicates or conflicts before recording the information.

[0495] Step 6:

[0496] The server synchronizes updated schedule information to each terminal via the communication network. The input is the updated schedule information, and the output is the sharing of the same schedule information to the group of terminals that receive it. After sharing, users can access that information from any terminal.

[0497] Step 7:

[0498] The server uses a data analysis device to re-evaluate the priority of user tasks based on external information. Inputs are information obtained from external sources and user schedule data, and output is optimized task sequence information. This process compares the data with the latest market data to perform analysis that maximizes operational efficiency.

[0499] Step 8:

[0500] The server uses its multilingual capabilities to translate instructions in different languages ​​into a parseable language as needed. The input is the text information that needs translation, and the output is the translated text information that can be processed internally. This process is designed to facilitate system use in international business environments.

[0501] (Application Example 1)

[0502] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0503] In modern society, individuals need to manage numerous schedules and tasks, and there is a demand for efficient organization of these. Furthermore, there is a need for systems that can flexibly adapt to multilingual environments and fluctuating market trends, and present optimal action plans tailored to individual needs. Additionally, there is a need for methods that simplify schedule management within the home, enabling all family members to act efficiently.

[0504] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0505] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information, extracting the user's schedule information, and updating it on a recording medium, and means for analyzing market trends and environmental data based on past data and proposing the optimal future actions. This makes it possible to effectively manage schedules with simple voice operations and to flexibly propose the optimal actions in response to multilingual environments and market fluctuations.

[0506] A "voice input device" is a device that converts external voice information into electrical signals, making it possible to analyze it as digital data.

[0507] "Means of converting into text information" refers to the process of processing audio data received from an audio input device and converting it into digital data in text format.

[0508] "Means for extracting schedule information and updating it on a recording medium" refers to a function that identifies schedule-related data from analyzed text information and saves it to a recording device as new or modified data.

[0509] "Means for synchronizing with multiple terminal devices" refers to the process of making information updated on a recording medium identical to that of multiple other electronic devices via a network without delay.

[0510] A "data analysis device" is a device that possesses the technology to analyze various types of information obtained from external sources and to reveal regularities and trends.

[0511] "A means of re-evaluating task priorities" refers to a function that allows users to review tasks relevant to them based on their importance and urgency, and rearrange them in an appropriate order.

[0512] "A means of analyzing market trends and environmental data to propose optimal actions for the future" refers to a process of analyzing past data and changes in the external environment to derive the optimal actions that users should take.

[0513] This system is activated when the user speaks into a voice input device. The voice input device collects the user's voice instructions and sends them to a server in real time as digital voice data. This voice data is sent to the server via the internet. There, the server converts the voice into text information using speech recognition software such as the Google Speech-to-Text API. After obtaining the text information, the server analyzes this data and extracts new tasks based on the user's schedule.

[0514] New schedule information is updated to the user's storage medium via a cloud-based scheduling management system such as the Google Calendar API, instantly synchronizing with all other devices the user owns. This synchronization feature allows users to consistently access the latest schedule information from any device.

[0515] Furthermore, the server utilizes data analysis libraries such as Pandas and NumPy to analyze market trends and historical environmental data obtained from external sources. This analysis allows the server to re-evaluate the optimal priorities for the user's tasks and provide suggestions for future actions. For example, it can make specific suggestions such as recommending a picnic date based on weather trends analyzed from historical data.

[0516] To support multilingual environments, the server uses the Google Translate API and other tools to translate voice commands in different languages ​​into a language that can be analyzed, enabling smooth operation.

[0517] This system allows users to effectively manage their schedules with minimal voice commands and offers flexible, optimal action suggestions in diverse environments, enabling efficient work and personal management. As a result, the user's entire life runs smoothly and efficiently.

[0518] An example of a prompt when using a generative AI model might be a command such as, "How can we design a home robot assistant that can manage the schedules of all family members simultaneously and suggest the best course of action based on those schedules?"

[0519] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0520] Step 1:

[0521] The terminal captures the user's voice commands via a voice input device. The input is the user's verbal instructions, and the output is digital voice data. This digital voice data is sent from the terminal to the server.

[0522] Step 2:

[0523] The server converts the received digital audio data into text information using speech recognition software (e.g., Google Speech-to-Text API). The input here is digital audio data, and the output is parseable text information. In this process, the server executes a speech recognition algorithm, converting phonemes into text.

[0524] Step 3:

[0525] The server analyzes textual information and extracts specific schedule-related data. The input is textual information, and the output is a set of extracted schedule information. The server uses natural language processing techniques to identify schedule information such as dates and times from this text data.

[0526] Step 4:

[0527] The server extracts schedule information, updates the storage medium using a cloud-based schedule management system (e.g., Google Calendar API), and synchronizes it across multiple devices. The input is the extracted schedule information, and the output is a list of other devices to which the updated schedule has been synchronized. The server calls the API to update the schedule and immediately synchronizes it between devices.

[0528] Step 5:

[0529] The server uses data analysis libraries (e.g., Pandas, NumPy) to analyze externally obtained market trend and environmental data, and re-evaluates task priorities. Inputs are schedule information and external data, and output is a task list with re-evaluated priorities. Based on the analysis results, the server reassessss the importance of each task.

[0530] Step 6:

[0531] The server provides users with a re-evaluated task list and suggestions. The input is a task list with re-evaluated priorities, and the output is optimal action suggestions presented to the user. The server uses a notification function to send recommendations to the user.

[0532] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0533] This invention combines an emotion engine with a next-generation system that analyzes voice commands to streamline task management, enabling work support that takes user emotions into consideration. The system configuration and specific examples are described below.

[0534] This system consists primarily of a terminal equipped with a voice input device and a server that processes the data. When a user gives a voice command using the terminal, the terminal converts the voice into digital data and sends it to the server. The server uses speech recognition software to convert the voice into text information and analyzes the schedule requested by the user.

[0535] What's interesting is that the server has an emotion engine that can identify the user's emotions from voice data. Specifically, it estimates the emotional state from the tone, speed, and word choice of the voice, and stores it on a recording medium. Based on this emotional information, the server dynamically adjusts task priorities to help users perform their work in a stress-free environment.

[0536] For example, if a user expresses an emotion indicating busyness during their speech, the server will identify that emotion and suggest postponing currently scheduled low-priority tasks. The emotion engine also monitors the user's stress level in real time, and if stress levels are high, it will display a message on the device suggesting a relaxation break.

[0537] This system enables emotion-based work support, allowing users to perform tasks efficiently in a way that suits their psychological state. Therefore, it goes beyond simply streamlining administrative tasks; it reduces the user's mental burden, making it a valuable system not only for general office work but also for various business environments. Furthermore, it can appropriately translate and analyze emotional expressions in different languages, providing support for understanding the nuances of the world even in international business environments.

[0538] The following describes the processing flow.

[0539] Step 1:

[0540] The user issues work instructions by voice via a terminal. The voice input device then captures the user's voice as digital audio data.

[0541] Step 2:

[0542] The terminal transmits the collected audio data to the server using a secure communication protocol. This data is then pre-processed for audio analysis.

[0543] Step 3:

[0544] The server uses a speech recognition engine to analyze the received audio data and convert it into text information. This text information is further analyzed and converted into structured data such as schedule management data.

[0545] Step 4:

[0546] The server activates an emotion engine to identify the user's emotions from the voice data. This process involves analyzing voice tone, speaking speed, and keywords used to estimate the emotional state.

[0547] Step 5:

[0548] The server uses the identified emotion data to re-evaluate the user's schedule and task priorities. For example, if an emotion indicating stress is detected, the server will replan by postponing less important tasks.

[0549] Step 6:

[0550] The server sends sentiment assessments and revised schedule information to the terminal, notifying the user of the latest recommended tasks and the reasons behind them. This also includes suggestions for relaxing breaks as needed.

[0551] Step 7:

[0552] Users review the information displayed on their device and choose whether to accept the new schedule or task priority. This feedback is collected as data and used for future sentiment evaluations.

[0553] Step 8:

[0554] The device updates its schedule based on user selection and continuously synchronizes information across all related devices, ensuring that users can access the latest information from any device.

[0555] (Example 2)

[0556] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0557] In recent years, while there has been a growing demand for increased operational efficiency, the importance of task management that also considers the psychological state of users has been increasing. However, conventional task management systems have been unable to take into account users' emotional information, resulting in insufficient support in reducing stress and anxiety. To solve this problem, there is a need for a system that can appropriately analyze users' emotions from voice data and dynamically adjust task priorities based on that analysis.

[0558] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0559] In this invention, the server includes means for converting acoustic data received through a voice input device into text information, means for identifying emotional information from the voice data and dynamically adjusting task priorities using an emotion engine, and means for translating instructions in different languages ​​into a parseable language and appropriately processing emotional information. This enables work support based on the user's emotions, reducing stress and improving work efficiency.

[0560] A "voice input device" is a device that converts sound into electrical signals, making them processable as digital data.

[0561] "Audio data" refers to digital data that includes sound, and is a data format that can be converted into text information using speech recognition technology.

[0562] "Textual information" refers to data in text format obtained by digitizing audio data.

[0563] "Recording medium" refers to any device or medium that stores digital information and can transmit or read it as needed.

[0564] An "emotion engine" refers to a technology that identifies a user's emotional state by analyzing voice data and uses the results to perform dynamic system operations.

[0565] "Communication network" refers to the entire network infrastructure used to send and receive data between multiple devices.

[0566] "Dynamic adjustment" refers to a system that adapts processes and settings in real time in response to user data and circumstances, in order to maintain an optimal state.

[0567] "External information" refers to all types of data obtained from outside the system, including weather information, traffic conditions, and other relevant schedule data.

[0568] Embodiments for carrying out this invention are described below.

[0569] This system primarily consists of terminals equipped with voice input devices and servers, which are computers that process data. Users give voice commands using the terminals. The terminals convert the voice received through the voice input devices into digital data. For example, if a smart device is used as the terminal, it uses its built-in microphone to acquire voice and converts it into data using its voice-to-digital conversion function.

[0570] The server uses speech recognition software to convert this speech data into text. This process can utilize, for example, speech recognition APIs from major technology companies. The computer analyzes the user's instructions from the resulting text and extracts schedule and task information.

[0571] Even more interesting is the inclusion of a server-based emotion engine that analyzes the user's emotions based on voice data. This estimates the emotional state based on the tone, speed, and word choice of the voice, and this information is stored on a recording medium. This emotional information is used by the server to dynamically re-evaluate task priorities. For example, if a user gives a voice command that expresses stress, the emotion engine can identify that emotion and suggest postponing low-priority tasks.

[0572] Furthermore, it utilizes translation functions to convert instructions in different languages ​​into an analyzable language, and also appropriately analyzes emotional states. This makes it possible to support the user's work efficiency even in a multilingual environment.

[0573] A concrete example of a prompt is, "Analyze the user's emotions based on their tone and content, and suggest appropriate changes to the task schedule." By inputting this prompt into the AI ​​model, appropriate responses and suggestions can be obtained. In this way, the system provides emotion-based work support, reducing the user's psychological burden.

[0574] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0575] Step 1:

[0576] The user inputs voice commands using the terminal. The terminal acquires voice data via a voice input device and converts this analog audio into a digital format. Specifically, the terminal's microphone captures the audio, and the built-in DSP (Digital Signal Processing Unit) converts the audio waveform into digital data. The output obtained as a result of this processing is digital audio data.

[0577] Step 2:

[0578] The terminal transfers the converted digital audio data to the server. HTTPS, a secure communication protocol over the internet, is used for data transfer. The digital audio data generated earlier is used as input, and the data reaches the server as output.

[0579] Step 3:

[0580] The server converts received digital audio data into text information using speech recognition software. For example, the server uses AI to analyze speech patterns from the audio data and generate corresponding text. This process utilizes API services to achieve highly accurate speech recognition. The input is digital audio data, and the output is text information.

[0581] Step 4:

[0582] The server analyzes the converted text information and extracts task and schedule information requested by the user. The server classifies the text data by content and updates the schedule and task list. The input is the text information to be analyzed, and the output is the updated schedule information.

[0583] Step 5:

[0584] The server uses an emotion engine to identify the user's emotions based on the voice data. It analyzes the tone and speed of the voice to estimate the user's emotional state. This process employs sophisticated algorithms, and the emotion information is stored on a recording medium. The input is the data after speech recognition is complete, and the output is the identified emotion information.

[0585] Step 6:

[0586] The server dynamically adjusts task priorities based on emotional information. For example, the server checks emotional information and, if the user is in a high-stress state, generates a suggestion to postpone low-priority tasks. The input is emotional information and current task information, and the output is a task list with updated priorities.

[0587] Step 7:

[0588] The device receives feedback from the server and displays suggested schedule adjustments and task priorities to the user. For example, the device displays the message "We recommend taking a break" on the screen. The input is feedback information from the server, and the output is visual feedback to the user.

[0589] (Application Example 2)

[0590] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0591] Providing efficient task management and emotional support simultaneously remains a challenging task in modern homes and workplaces. Especially given the stress often experienced in busy daily lives, proper prioritization and emotional support are essential. Furthermore, consistent support across diverse language environments is crucial.

[0592] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0593] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information to extract the user's schedule information and update it on a recording medium, means for synchronizing the updated schedule information to multiple terminal devices via a communication network, means for re-evaluating the priority of the user's tasks using external information acquired by a data analysis device, and means for estimating the emotional state from the user's voice data and dynamically adjusting work support based on the re-evaluated priority. This reduces the psychological burden on the user and enables efficient work execution.

[0594] A "voice input device" is a device that converts voice data into a digital format and transmits it to a processing unit.

[0595] "Textual information" refers to information that has been digitally converted from audio data and expressed in text format.

[0596] A "recording medium" is a physical or digital storage device that stores data, making it available for later use or analysis.

[0597] A "communication network" is a network infrastructure for transmitting data between multiple terminal devices.

[0598] A "data analysis device" is a computer system that processes diverse information and provides analysis results.

[0599] "External information" refers to supplementary data obtained from outside the user's environment and is used to adjust tasks and schedules.

[0600] "Priority" is an indicator used to determine the order in which to perform multiple tasks or duties.

[0601] "Emotional state" refers to the user's psychological or emotional state estimated based on voice analysis and other data.

[0602] "Business support" refers to assistance and systems designed to help people perform tasks and work efficiently.

[0603] "Dynamic adjustment" means changing the order and content of tasks in real time according to the situation in order to optimize them.

[0604] This invention utilizes a terminal equipped with a voice input device to allow the user to input voice data, which is then digitized as text information. This digitized text information is transmitted to a server via a communication network. The server analyzes the received text information, extracts the user's schedule information, and stores it on a recording medium. During this process, a data analysis device is used to acquire external information and re-evaluate the user's task priorities. The re-evaluated priorities are dynamically adjusted based on the emotional state contained in the user's voice data.

[0605] The server provides work support to the user based on dynamically adjusted task priorities. As part of this work support, if the user's emotional state indicates high stress, the server may generate a message suggesting a break. For example, if the user voice-inputs, "I need to prepare for a meeting after lunch," the server, if it determines the user is stressed, will suggest, "How about taking a short break before preparing for the meeting?"

[0606] This system can also translate instructions in various languages ​​and provide appropriate work support based on the user's emotional state. For speech recognition, it uses, for example, the Google Speech-to-Text API, and for sentiment analysis, it uses IBM Watson Tone Analyzer. The Asana API can be used to manage the user's schedule and tasks.

[0607] When using a generative AI model to generate prompt statements, the following prompts are used:

[0608] "When the system analyzes that the user is experiencing stress, please generate a message suggesting they take a break."

[0609] Through this prompt, the AI ​​can create optimal suggestions for the user and help alleviate stress.

[0610] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0611] Step 1:

[0612] The device receives voice input from the user. This voice data is captured via the built-in microphone and converted into a digital format that can be sent to the server. The Google Speech-to-Text API is used to convert the voice input into text data. This process extracts text information from the voice.

[0613] Step 2:

[0614] The server analyzes the received text information and extracts the user's schedule and tasks. Natural Language Processing (NLP) technology is used for the analysis. It receives text-based audio data as input, organizes schedule information and important tasks based on that data, and creates output data to update the storage medium.

[0615] Step 3:

[0616] The server evaluates the tone of voice data and text to perform sentiment analysis and estimate the user's emotional state. IBM Watson Tone Analyzer is used for this process. The input is voice or text data, and the output is an indicator of the emotional state.

[0617] Step 4:

[0618] The server uses data analysis equipment to collect external information (weather information, news, traffic conditions, etc.) and re-evaluate the user's task priorities. It also takes the user's emotional state into consideration, dynamically adjusting task priorities. Inputs include external information, the user's emotional state, and task information, while output is the updated task priorities.

[0619] Step 5:

[0620] Based on the user's emotional state and task priorities, the server generates messages and suggestions to support work. During periods of high stress, it supports work efficiency through suggestions such as taking breaks. Using a generation AI model, it outputs the most suitable suggestions to the user based on appropriate prompt sentences. In this step, emotional state and task information are used as input, and suggestion messages are obtained as output.

[0621] Step 6:

[0622] The terminal notifies the user of suggested messages and schedule information received from the server. Information is conveyed visually or audibly using a display device. This helps reduce stress and facilitates efficient task management. Input is the instruction output from the server.

[0623] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0624] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0625] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0626] [Fourth Embodiment]

[0627] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0628] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0629] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0630] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0631] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0632] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0633] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0634] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0635] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0636] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0637] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0638] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0639] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0640] The system based on this invention provides users with efficient and personalized work support through the coordinated operation of various functions, primarily speech recognition, schedule management, data analysis, priority optimization, and multilingual support. Details of each function and their implementation examples are described below.

[0641] A terminal equipped with a voice input device captures voice instructions from the user and processes them as digital voice data. The terminal sends this voice data to a server. The server uses voice recognition software to convert the voice data into text information. Based on this text information, the server analyzes the user's instructions and recognizes them as schedule management tasks.

[0642] The server accesses the user's schedule database to check for and update new appointments at the specified time. This updated schedule information is instantly synchronized across multiple devices via the cloud. This ensures consistency in work processes, as users can access the same schedule information across multiple devices.

[0643] Furthermore, the server uses data analysis devices to acquire market trends and historical business data from external sources, and re-evaluates the priority of the user's tasks. Based on these analysis results, the user is presented with the optimal task order.

[0644] To enable multilingual support, the server translates received instructions, even if they are given in different languages, into a language it can internally parse. This allows users to smoothly utilize the system even in an international business environment.

[0645] For example, if a user uses voice input to say, "Add a project meeting tomorrow at 3 PM," the device captures this instruction and sends it to the server. The server analyzes this and updates the schedule, which is then reflected in real time across all of the user's devices. Furthermore, the server reassesssing the importance of meetings based on external data and current market trends, enabling users to manage tasks efficiently.

[0646] This system reduces the time users spend on cumbersome schedule management and task prioritization, allowing them to focus on important tasks. Furthermore, its multilingual support facilitates smooth communication across different cultures and language regions.

[0647] The following describes the processing flow.

[0648] Step 1:

[0649] The user gives instructions via a voice input device. The voice input device receives this voice as digital data and transmits it to the terminal.

[0650] Step 2:

[0651] The terminal sends the received audio data to the server. The data is sent using an appropriate communication protocol.

[0652] Step 3:

[0653] The server receives the audio data and uses speech recognition software to convert the audio into text format. In this process, the audio data is converted into analyzable text information.

[0654] Step 4:

[0655] The server analyzes the converted character information. If it recognizes that the instructions are related to schedule management, it accesses the relevant schedule database to check for the existence of the specified appointment.

[0656] Step 5:

[0657] The server checks for available time slots and adds the new appointment to the schedule. This update is instantly synchronized via the cloud to all other devices the user uses.

[0658] Step 6:

[0659] The server uses data analysis equipment to re-evaluate the priority of user tasks based on market trends and business data acquired from external sources.

[0660] Step 7:

[0661] The server sends the re-evaluation results to the device and presents the user with a schedule based on the latest priorities. It also notifies the user of updates as needed by sending notifications to the device.

[0662] Step 8:

[0663] Users can check updated schedule information and task priorities through their devices and adjust their work plans as needed.

[0664] (Example 1)

[0665] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0666] In today's business environment, users are required to efficiently manage a wide variety of tasks and appropriately adjust their schedules. However, dealing with different language environments and large amounts of information can be burdensome for users. While voice-based scheduling and multilingual automation are advancing, there is a need to pursue even greater accuracy and efficiency.

[0667] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0668] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information and extracting the user's schedule information to update it on a recording medium, and means for synchronizing the updated schedule information to multiple terminal devices via a communication network. This enables the user to manage their schedule using voice commands.

[0669] A "voice input device" is a device that receives voice commands from a user and has the function of converting them into digital voice data.

[0670] "Means of converting into text information" refers to technology that analyzes audio data and digitizes it as corresponding text information.

[0671] "A means of analyzing, extracting user schedule information, and updating it on a recording medium" refers to the process of identifying the user's instructions based on textual information and recording this information in a database to update the schedule information.

[0672] "Means of synchronizing to multiple terminal devices via a communication network" refers to technology that synchronizes updated information to multiple devices in real time via a network.

[0673] A "data analysis device" is a device that analyzes data acquired from an external source and extracts useful information.

[0674] "Methods for re-evaluating task priorities" refers to a process that re-evaluates the importance of a user's tasks based on acquired external information and determines their order.

[0675] "Means of implementing noise cancellation" refers to technologies used to reduce ambient noise during audio recording and obtain clear audio data.

[0676] "A means of performing conflict checks and updates" refers to a technique that verifies that newly added appointments do not overlap with existing appointments and updates the database as necessary.

[0677] The "function to translate instructions in different languages ​​and convert them into a parseable language" is a technology aimed at accurately analyzing information provided in different languages ​​by converting it into a language that can be processed internally, even in a multilingual environment.

[0678] The system based on this invention utilizes advanced speech recognition technology, schedule management functions, priority optimization, and multilingual support to provide users with personalized work assistance.

[0679] First, the terminal captures voice commands from the user using a voice input device. During this process, noise cancellation technology is implemented to effectively eliminate ambient noise. This voice data is processed digitally by the terminal and transmitted to the server.

[0680] The server converts voice data into text using speech recognition software. Specifically, it uses a general cloud-based speech recognition service, for example, applying "automation software provided by a voice input device manufacturer." This text information is then analyzed to extract the user's instructions, and based on this, schedule management tasks are generated.

[0681] Subsequently, the server adds the updated schedule to the database, which serves as the recording medium, and this information is instantly synchronized to multiple user terminals via the communication network. The schedule information update also includes data conflict checks to prevent inconsistencies with existing schedules.

[0682] Furthermore, data analysis equipment is used to analyze information acquired from external sources and users' work history, dynamically optimizing task priorities. To this end, for example, a "data analysis platform" is utilized to generate insights that maximize operational efficiency.

[0683] Furthermore, the system features multilingual support, automatically translating instructions received in different languages ​​into a language that can be analyzed. This feature allows users to seamlessly utilize the system even in international business environments.

[0684] For example, if a user voice-inputs "Add a project meeting tomorrow at 3pm," the device captures this instruction and sends it to the server. The server analyzes this and updates the schedule, which is then reflected in real time across all of the user's devices. This system allows users to reduce the time they spend on complex schedule management and task prioritization, enabling them to focus on important work.

[0685] An example of a prompt in a generative AI model is: "Please describe a scenario for a system where the user gives voice instructions, the system updates the schedule based on those instructions, and performs optimal task management."

[0686] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0687] Step 1:

[0688] The device captures voice commands from the user via a voice input device. The input is the user's voice commands, and the output is digital audio data. After acquiring this data, the device performs noise cancellation processing to improve the quality of the audio data.

[0689] Step 2:

[0690] The terminal transmits processed digital audio data to the server. The input is digital audio data with noise removed, and the output is data sent to the server via synchronized network communication. This transmission is carried out using encryption technology to ensure data security.

[0691] Step 3:

[0692] The server analyzes the received audio data using speech recognition software and converts it into text information. The input is digital audio data, and the output is the corresponding text information. Here, the generated text information is temporarily recorded in a database as intermediate data.

[0693] Step 4:

[0694] The server analyzes textual information and extracts user instructions. The input is textual information, and the output is a dataset for schedule updates. During the analysis process, keyword matching and natural language processing techniques are used to extract the instructions.

[0695] Step 5:

[0696] The server updates the user's schedule database based on the instructions. The input is a dataset for schedule updates, and the output is the updated schedule information. During this process, the system compares the updated data with existing data to check for duplicates or conflicts before recording the information.

[0697] Step 6:

[0698] The server synchronizes updated schedule information to each terminal via the communication network. The input is the updated schedule information, and the output is the sharing of the same schedule information to the group of terminals that receive it. After sharing, users can access that information from any terminal.

[0699] Step 7:

[0700] The server uses a data analysis device to re-evaluate the priority of user tasks based on external information. Inputs are information obtained from external sources and user schedule data, and output is optimized task sequence information. This process compares the data with the latest market data to perform analysis that maximizes operational efficiency.

[0701] Step 8:

[0702] The server uses its multilingual capabilities to translate instructions in different languages ​​into a parseable language as needed. The input is the text information that needs translation, and the output is the translated text information that can be processed internally. This process is designed to facilitate system use in international business environments.

[0703] (Application Example 1)

[0704] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0705] In modern society, individuals need to manage numerous schedules and tasks, and there is a demand for efficient organization of these. Furthermore, there is a need for systems that can flexibly adapt to multilingual environments and fluctuating market trends, and present optimal action plans tailored to individual needs. Additionally, there is a need for methods that simplify schedule management within the home, enabling all family members to act efficiently.

[0706] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0707] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information, extracting the user's schedule information, and updating it on a recording medium, and means for analyzing market trends and environmental data based on past data and proposing the optimal future actions. This makes it possible to effectively manage schedules with simple voice operations and to flexibly propose the optimal actions in response to multilingual environments and market fluctuations.

[0708] A "voice input device" is a device that converts external voice information into electrical signals, making it possible to analyze it as digital data.

[0709] "Means of converting into text information" refers to the process of processing audio data received from an audio input device and converting it into digital data in text format.

[0710] "Means for extracting schedule information and updating it on a recording medium" refers to a function that identifies schedule-related data from analyzed text information and saves it to a recording device as new or modified data.

[0711] "Means for synchronizing with multiple terminal devices" refers to the process of making information updated on a recording medium identical to that of multiple other electronic devices via a network without delay.

[0712] A "data analysis device" is a device that possesses the technology to analyze various types of information obtained from external sources and to reveal regularities and trends.

[0713] "A means of re-evaluating task priorities" refers to a function that allows users to review tasks relevant to them based on their importance and urgency, and rearrange them in an appropriate order.

[0714] "A means of analyzing market trends and environmental data to propose optimal actions for the future" refers to a process of analyzing past data and changes in the external environment to derive the optimal actions that users should take.

[0715] This system is activated when the user speaks into a voice input device. The voice input device collects the user's voice instructions and sends them to a server in real time as digital voice data. This voice data is sent to the server via the internet. There, the server converts the voice into text information using speech recognition software such as the Google Speech-to-Text API. After obtaining the text information, the server analyzes this data and extracts new tasks based on the user's schedule.

[0716] New schedule information is updated to the user's storage medium via a cloud-based scheduling management system such as the Google Calendar API, instantly synchronizing with all other devices the user owns. This synchronization feature allows users to consistently access the latest schedule information from any device.

[0717] Furthermore, the server utilizes data analysis libraries such as Pandas and NumPy to analyze market trends and historical environmental data obtained from external sources. This analysis allows the server to re-evaluate the optimal priorities for the user's tasks and provide suggestions for future actions. For example, it can make specific suggestions such as recommending a picnic date based on weather trends analyzed from historical data.

[0718] To support multilingual environments, the server uses the Google Translate API and other tools to translate voice commands in different languages ​​into a language that can be analyzed, enabling smooth operation.

[0719] This system allows users to effectively manage their schedules with minimal voice commands and offers flexible, optimal action suggestions in diverse environments, enabling efficient work and personal management. As a result, the user's entire life runs smoothly and efficiently.

[0720] An example of a prompt when using a generative AI model might be a command such as, "How can we design a home robot assistant that can manage the schedules of all family members simultaneously and suggest the best course of action based on those schedules?"

[0721] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0722] Step 1:

[0723] The terminal captures the user's voice commands via a voice input device. The input is the user's verbal instructions, and the output is digital voice data. This digital voice data is sent from the terminal to the server.

[0724] Step 2:

[0725] The server converts the received digital audio data into text information using speech recognition software (e.g., Google Speech-to-Text API). The input here is digital audio data, and the output is parseable text information. In this process, the server executes a speech recognition algorithm, converting phonemes into text.

[0726] Step 3:

[0727] The server analyzes textual information and extracts specific schedule-related data. The input is textual information, and the output is a set of extracted schedule information. The server uses natural language processing techniques to identify schedule information such as dates and times from this text data.

[0728] Step 4:

[0729] The server extracts schedule information, updates the storage medium using a cloud-based schedule management system (e.g., Google Calendar API), and synchronizes it across multiple devices. The input is the extracted schedule information, and the output is a list of other devices to which the updated schedule has been synchronized. The server calls the API to update the schedule and immediately synchronizes it between devices.

[0730] Step 5:

[0731] The server uses data analysis libraries (e.g., Pandas, NumPy) to analyze externally obtained market trend and environmental data, and re-evaluates task priorities. Inputs are schedule information and external data, and output is a task list with re-evaluated priorities. Based on the analysis results, the server reassessss the importance of each task.

[0732] Step 6:

[0733] The server provides users with a re-evaluated task list and suggestions. The input is a task list with re-evaluated priorities, and the output is optimal action suggestions presented to the user. The server uses a notification function to send recommendations to the user.

[0734] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0735] This invention combines an emotion engine with a next-generation system that analyzes voice commands to streamline task management, enabling work support that takes user emotions into consideration. The system configuration and specific examples are described below.

[0736] This system consists primarily of a terminal equipped with a voice input device and a server that processes the data. When a user gives a voice command using the terminal, the terminal converts the voice into digital data and sends it to the server. The server uses speech recognition software to convert the voice into text information and analyzes the schedule requested by the user.

[0737] What's interesting is that the server has an emotion engine that can identify the user's emotions from voice data. Specifically, it estimates the emotional state from the tone, speed, and word choice of the voice, and stores it on a recording medium. Based on this emotional information, the server dynamically adjusts task priorities to help users perform their work in a stress-free environment.

[0738] For example, if a user expresses an emotion indicating busyness during their speech, the server will identify that emotion and suggest postponing currently scheduled low-priority tasks. The emotion engine also monitors the user's stress level in real time, and if stress levels are high, it will display a message on the device suggesting a relaxation break.

[0739] This system enables emotion-based work support, allowing users to perform tasks efficiently in a way that suits their psychological state. Therefore, it goes beyond simply streamlining administrative tasks; it reduces the user's mental burden, making it a valuable system not only for general office work but also for various business environments. Furthermore, it can appropriately translate and analyze emotional expressions in different languages, providing support for understanding the nuances of the world even in international business environments.

[0740] The following describes the processing flow.

[0741] Step 1:

[0742] The user issues work instructions by voice via a terminal. The voice input device then captures the user's voice as digital audio data.

[0743] Step 2:

[0744] The terminal transmits the collected audio data to the server using a secure communication protocol. This data is then pre-processed for audio analysis.

[0745] Step 3:

[0746] The server uses a speech recognition engine to analyze the received audio data and convert it into text information. This text information is further analyzed and converted into structured data such as schedule management data.

[0747] Step 4:

[0748] The server activates an emotion engine to identify the user's emotions from the voice data. This process involves analyzing voice tone, speaking speed, and keywords used to estimate the emotional state.

[0749] Step 5:

[0750] The server uses the identified emotion data to re-evaluate the user's schedule and task priorities. For example, if an emotion indicating stress is detected, the server will replan by postponing less important tasks.

[0751] Step 6:

[0752] The server sends sentiment assessments and revised schedule information to the terminal, notifying the user of the latest recommended tasks and the reasons behind them. This also includes suggestions for relaxing breaks as needed.

[0753] Step 7:

[0754] Users review the information displayed on their device and choose whether to accept the new schedule or task priority. This feedback is collected as data and used for future sentiment evaluations.

[0755] Step 8:

[0756] The device updates its schedule based on user selection and continuously synchronizes information across all related devices, ensuring that users can access the latest information from any device.

[0757] (Example 2)

[0758] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0759] In recent years, while there has been a growing demand for increased operational efficiency, the importance of task management that also considers the psychological state of users has been increasing. However, conventional task management systems have been unable to take into account users' emotional information, resulting in insufficient support in reducing stress and anxiety. To solve this problem, there is a need for a system that can appropriately analyze users' emotions from voice data and dynamically adjust task priorities based on that analysis.

[0760] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0761] In this invention, the server includes means for converting acoustic data received through a voice input device into text information, means for identifying emotional information from the voice data and dynamically adjusting task priorities using an emotion engine, and means for translating instructions in different languages ​​into a parseable language and appropriately processing emotional information. This enables work support based on the user's emotions, reducing stress and improving work efficiency.

[0762] A "voice input device" is a device that converts sound into electrical signals, making them processable as digital data.

[0763] "Audio data" refers to digital data that includes sound, and is a data format that can be converted into text information using speech recognition technology.

[0764] "Textual information" refers to data in text format obtained by digitizing audio data.

[0765] "Recording medium" refers to any device or medium that stores digital information and can transmit or read it as needed.

[0766] An "emotion engine" refers to a technology that identifies a user's emotional state by analyzing voice data and uses the results to perform dynamic system operations.

[0767] "Communication network" refers to the entire network infrastructure used to send and receive data between multiple devices.

[0768] "Dynamic adjustment" refers to a system that adapts processes and settings in real time in response to user data and circumstances, in order to maintain an optimal state.

[0769] "External information" refers to all types of data obtained from outside the system, including weather information, traffic conditions, and other relevant schedule data.

[0770] Embodiments for carrying out this invention are described below.

[0771] This system primarily consists of terminals equipped with voice input devices and servers, which are computers that process data. Users give voice commands using the terminals. The terminals convert the voice received through the voice input devices into digital data. For example, if a smart device is used as the terminal, it uses its built-in microphone to acquire voice and converts it into data using its voice-to-digital conversion function.

[0772] The server uses speech recognition software to convert this speech data into text. This process can utilize, for example, speech recognition APIs from major technology companies. The computer analyzes the user's instructions from the resulting text and extracts schedule and task information.

[0773] Even more interesting is the inclusion of a server-based emotion engine that analyzes the user's emotions based on voice data. This estimates the emotional state based on the tone, speed, and word choice of the voice, and this information is stored on a recording medium. This emotional information is used by the server to dynamically re-evaluate task priorities. For example, if a user gives a voice command that expresses stress, the emotion engine can identify that emotion and suggest postponing low-priority tasks.

[0774] Furthermore, it utilizes translation functions to convert instructions in different languages ​​into an analyzable language, and also appropriately analyzes emotional states. This makes it possible to support the user's work efficiency even in a multilingual environment.

[0775] A concrete example of a prompt is, "Analyze the user's emotions based on their tone and content, and suggest appropriate changes to the task schedule." By inputting this prompt into the AI ​​model, appropriate responses and suggestions can be obtained. In this way, the system provides emotion-based work support, reducing the user's psychological burden.

[0776] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0777] Step 1:

[0778] The user inputs voice commands using the terminal. The terminal acquires voice data via a voice input device and converts this analog audio into a digital format. Specifically, the terminal's microphone captures the audio, and the built-in DSP (Digital Signal Processing Unit) converts the audio waveform into digital data. The output obtained as a result of this processing is digital audio data.

[0779] Step 2:

[0780] The terminal transfers the converted digital audio data to the server. HTTPS, a secure communication protocol over the internet, is used for data transfer. The digital audio data generated earlier is used as input, and the data reaches the server as output.

[0781] Step 3:

[0782] The server converts received digital audio data into text information using speech recognition software. For example, the server uses AI to analyze speech patterns from the audio data and generate corresponding text. This process utilizes API services to achieve highly accurate speech recognition. The input is digital audio data, and the output is text information.

[0783] Step 4:

[0784] The server analyzes the converted text information and extracts task and schedule information requested by the user. The server classifies the text data by content and updates the schedule and task list. The input is the text information to be analyzed, and the output is the updated schedule information.

[0785] Step 5:

[0786] The server uses an emotion engine to identify the user's emotions based on the voice data. It analyzes the tone and speed of the voice to estimate the user's emotional state. This process employs sophisticated algorithms, and the emotion information is stored on a recording medium. The input is the data after speech recognition is complete, and the output is the identified emotion information.

[0787] Step 6:

[0788] The server dynamically adjusts task priorities based on emotional information. For example, the server checks emotional information and, if the user is in a high-stress state, generates a suggestion to postpone low-priority tasks. The input is emotional information and current task information, and the output is a task list with updated priorities.

[0789] Step 7:

[0790] The device receives feedback from the server and displays suggested schedule adjustments and task priorities to the user. For example, the device displays the message "We recommend taking a break" on the screen. The input is feedback information from the server, and the output is visual feedback to the user.

[0791] (Application Example 2)

[0792] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0793] Providing efficient task management and emotional support simultaneously remains a challenging task in modern homes and workplaces. Especially given the stress often experienced in busy daily lives, proper prioritization and emotional support are essential. Furthermore, consistent support across diverse language environments is crucial.

[0794] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0795] In this invention, the server includes means for converting voice data received through a voice input device into text information, means for analyzing the text information to extract the user's schedule information and update it on a recording medium, means for synchronizing the updated schedule information to multiple terminal devices via a communication network, means for re-evaluating the priority of the user's tasks using external information acquired by a data analysis device, and means for estimating the emotional state from the user's voice data and dynamically adjusting work support based on the re-evaluated priority. This reduces the psychological burden on the user and enables efficient work execution.

[0796] A "voice input device" is a device that converts voice data into a digital format and transmits it to a processing unit.

[0797] "Textual information" refers to information that has been digitally converted from audio data and expressed in text format.

[0798] A "recording medium" is a physical or digital storage device that stores data, making it available for later use or analysis.

[0799] A "communication network" is a network infrastructure for transmitting data between multiple terminal devices.

[0800] A "data analysis device" is a computer system that processes diverse information and provides analysis results.

[0801] "External information" refers to supplementary data obtained from outside the user's environment and is used to adjust tasks and schedules.

[0802] "Priority" is an indicator used to determine the order in which to perform multiple tasks or duties.

[0803] "Emotional state" refers to the user's psychological or emotional state estimated based on voice analysis and other data.

[0804] "Business support" refers to assistance and systems designed to help people perform tasks and work efficiently.

[0805] "Dynamic adjustment" means changing the order and content of tasks in real time according to the situation in order to optimize them.

[0806] This invention utilizes a terminal equipped with a voice input device to allow the user to input voice data, which is then digitized as text information. This digitized text information is transmitted to a server via a communication network. The server analyzes the received text information, extracts the user's schedule information, and stores it on a recording medium. During this process, a data analysis device is used to acquire external information and re-evaluate the user's task priorities. The re-evaluated priorities are dynamically adjusted based on the emotional state contained in the user's voice data.

[0807] The server provides work support to the user based on dynamically adjusted task priorities. As part of this work support, if the user's emotional state indicates high stress, the server may generate a message suggesting a break. For example, if the user voice-inputs, "I need to prepare for a meeting after lunch," the server, if it determines the user is stressed, will suggest, "How about taking a short break before preparing for the meeting?"

[0808] This system can also translate instructions in various languages ​​and provide appropriate work support based on the user's emotional state. For speech recognition, it uses, for example, the Google Speech-to-Text API, and for sentiment analysis, it uses IBM Watson Tone Analyzer. The Asana API can be used to manage the user's schedule and tasks.

[0809] When using a generative AI model to generate prompt statements, the following prompts are used:

[0810] "When the system analyzes that the user is experiencing stress, please generate a message suggesting they take a break."

[0811] Through this prompt, the AI ​​can create optimal suggestions for the user and help alleviate stress.

[0812] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0813] Step 1:

[0814] The device receives voice input from the user. This voice data is captured via the built-in microphone and converted into a digital format that can be sent to the server. The Google Speech-to-Text API is used to convert the voice input into text data. This process extracts text information from the voice.

[0815] Step 2:

[0816] The server analyzes the received text information and extracts the user's schedule and tasks. Natural Language Processing (NLP) technology is used for the analysis. It receives text-based audio data as input, organizes schedule information and important tasks based on that data, and creates output data to update the storage medium.

[0817] Step 3:

[0818] The server evaluates the tone of voice data and text to perform sentiment analysis and estimate the user's emotional state. IBM Watson Tone Analyzer is used for this process. The input is voice or text data, and the output is an indicator of the emotional state.

[0819] Step 4:

[0820] The server uses data analysis equipment to collect external information (weather information, news, traffic conditions, etc.) and re-evaluate the user's task priorities. It also takes the user's emotional state into consideration, dynamically adjusting task priorities. Inputs include external information, the user's emotional state, and task information, while output is the updated task priorities.

[0821] Step 5:

[0822] Based on the user's emotional state and task priorities, the server generates messages and suggestions to support work. During periods of high stress, it supports work efficiency through suggestions such as taking breaks. Using a generation AI model, it outputs the most suitable suggestions to the user based on appropriate prompt sentences. In this step, emotional state and task information are used as input, and suggestion messages are obtained as output.

[0823] Step 6:

[0824] The terminal notifies the user of suggested messages and schedule information received from the server. Information is conveyed visually or audibly using a display device. This helps reduce stress and facilitates efficient task management. Input is the instruction output from the server.

[0825] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0826] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0827] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0828] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0829] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. In the upper and lower directions of the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. Also, the upper side of the concentric circles is where "pleasant" emotions are located, and the lower side is where "unpleasant" emotions are located. In this way, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0830] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0831] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0832] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0833] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0834] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0835] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0836] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0837] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0838] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0839] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0840] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0841] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0842] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0843] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0844] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0845] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0846] The following is further disclosed regarding the embodiments described above.

[0847] (Claim 1)

[0848] A means for converting audio data received through an audio input device into text information,

[0849] A means for analyzing the aforementioned text information, extracting the user's schedule information, and updating it on a recording medium,

[0850] Means for synchronizing the updated schedule information to multiple terminal devices via a communication network,

[0851] A means for re-evaluating the priority of a user's tasks using external information acquired by a data analysis device,

[0852] A system that includes this.

[0853] (Claim 2)

[0854] The system according to claim 1, characterized by having a function that learns the user's work history and provides customized work support.

[0855] (Claim 3)

[0856] The system according to claim 1, characterized by having a function to translate instructions in different languages ​​and convert them into an analyzable language.

[0857] "Example 1"

[0858] (Claim 1)

[0859] A means for converting audio data received through an audio input device into text information,

[0860] A means for analyzing the aforementioned text information, extracting the user's schedule information, and updating it on a recording medium,

[0861] Means for synchronizing the updated schedule information to multiple terminal devices via a communication network,

[0862] A means for re-evaluating the priority of a user's tasks using external information acquired by a data analysis device,

[0863] A method for performing noise cancellation when recording voice instructions,

[0864] A method for updating scheduled information by checking for conflicts with existing database information,

[0865] A system that includes this.

[0866] (Claim 2)

[0867] The system according to claim 1, characterized by having a function that learns the user's work history and provides customized work support.

[0868] (Claim 3)

[0869] The system according to claim 1, characterized by having a function to translate instructions in different languages ​​and convert them into an analyzable language.

[0870] "Application Example 1"

[0871] (Claim 1)

[0872] A means for converting audio data received through an audio input device into text information,

[0873] A means for analyzing the aforementioned text information, extracting the user's schedule information, and updating it on a recording medium,

[0874] Means for synchronizing the updated schedule information to multiple terminal devices via a communication network,

[0875] A means for re-evaluating the priority of a user's tasks using external information acquired by a data analysis device,

[0876] A means of analyzing market trends and environmental data based on past data to propose optimal actions for the future,

[0877] A system that includes this.

[0878] (Claim 2)

[0879] The system according to claim 1, characterized by having a function that learns the user's work history and provides customized work support.

[0880] (Claim 3)

[0881] The system according to claim 1, characterized by having a function to translate instructions in different languages ​​and convert them into an analyzable language.

[0882] "Example 2 of combining an emotion engine"

[0883] (Claim 1)

[0884] A means for converting acoustic data received through an audio input device into text information,

[0885] A means for analyzing the aforementioned text information, extracting the user's schedule information, and updating it on a recording medium,

[0886] Means for synchronizing the updated schedule information to multiple terminal devices via a communication network,

[0887] A means for identifying emotional information from audio data and dynamically adjusting task priorities using an emotion engine,

[0888] A means for re-evaluating the priority of a user's tasks using external information acquired by a data analysis device,

[0889] A system that includes this.

[0890] (Claim 2)

[0891] The system according to claim 1, characterized by learning the user's work history and providing customized work support based on their emotional state.

[0892] (Claim 3)

[0893] The system according to claim 1, characterized by having the ability to translate instructions in different languages, convert them into an analyzable language, and appropriately process emotional information.

[0894] "Application example 2 when combining with an emotional engine"

[0895] (Claim 1)

[0896] A means for converting audio data received through an audio input device into text information,

[0897] A means for analyzing the aforementioned text information, extracting the user's schedule information, and updating it on a recording medium,

[0898] Means for synchronizing the updated schedule information to multiple terminal devices via a communication network,

[0899] A means for re-evaluating the priority of a user's tasks using external information acquired by a data analysis device,

[0900] A means of estimating the emotional state from the user's voice data and dynamically adjusting work support based on the re-evaluated priorities,

[0901] A system that includes this.

[0902] (Claim 2)

[0903] The system according to claim 1, which has a function to learn the user's work history and provide customized work support, and which suggests breaks and support activities according to the user's emotional state.

[0904] (Claim 3)

[0905] The system according to claim 1, which has the function of translating instructions in different languages ​​and converting them into an analyzable language, and provides work support in consideration of the user's emotional state in response to the translated instructions. [Explanation of Symbols]

[0906] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means for converting audio data received through an audio input device into text information, A means for analyzing the aforementioned text information, extracting the user's schedule information, and updating it on a recording medium, Means for synchronizing the updated schedule information to multiple terminal devices via a communication network, A means for re-evaluating the priority of a user's tasks using external information acquired by a data analysis device, A means of analyzing market trends and environmental data based on past data to propose optimal actions for the future, A system that includes this.

2. The system according to claim 1, characterized by having a function that learns the user's work history and provides customized work support.

3. The system according to claim 1, characterized by having a function to translate instructions in different languages ​​and convert them into an analyzable language.